We present a primal-dual approximation algorithm for minimizing the leakage power of an integrated circuit by assigning gate threshold voltages. While most existing techniques do not provide a performance guarantee, we prove an upper bound on the power consumption. The algorithm is practical and works with an industrial sign-off timer. It can be used for post-routing power reduction or for optimizing leakage power throughout the design flow. We demonstrate the practical performance on recent microprocessor units. Our implementation obtains significant leakage power reductions of up to 8% on top of one of the most successful algorithms for gate sizing and threshold voltage optimization. After timing-aware global routing we achieve leakage power reductions of up to 34%.
Reliability is an important automotive functional safety property, and the reliability requirement of safety-critical automotive function must be assured. Pre-assigning reliability values to unassigned tasks by transferring the reliability requirement of the function to each task is a useful reliability requirement assurance approach proposed in recent years. However, the pre-assigned reliability values in state-of-the-art studies are pessimistic toward ineffective reliability requirement assurance, thereby resulting in a limited reduction in response time. This study presents the geometric mean-based non-fault-tolerant reliability pre-assignment (GMNRP) and geometric mean-based fault-tolerant reliability pre-assignment (GMFRP) approaches, in which geometric mean-based reliability values are pre-assigned to unassigned tasks. Geometric mean can make the pre-assigned reliability values of unassigned tasks to the central tendency. Experimental results show that GMNRP and GMFRP can effectively reduce the response time compared with their individual state-of-the-art counterparts.
Three-dimensional (3D) integration enables the design of high-performance and energy efficient network on chip (NoC) architectures as communication backbones for manycore chips. To exploit the benefits of the vertical dimension of 3D integration, through-silicon-via (TSV) has been predominantly used in state-of-the-art manycore chip design. However, for TSV-based systems, high power density and the resultant thermal hotspot remain major concerns from the perspectives of chip functionality and overall reliability. The power consumption and thermal profiles of 3D NoCs can be improved by incorporating a Voltage-Frequency-Island (VFI)-based power management strategy. However, due to inherent thermal constraints of a TSV-based 3D system, we are unable to fully exploit the benefits offered by the power management methodology. In this context, emergence of monolithic 3D (M3D) integration has opened up new possibility of designing ultra-low-power and high-performance circuits and systems. The smaller dimensions of the inter-layer dielectric (ILD) and monolithic inter-tier vias (MIVs) offer high-density integration, flexibility of partitioning logic blocks across multiple tiers, and significant reduction of total wire-length. In this work, we present the first-ever study of the performance-thermal trade-offs for energy efficient monolithic 3D manycore chips. In particular, we present a comparative performance evaluation of M3D NoCs with respect to their conventional TSV-based counterparts. We demonstrate that the proposed M3D-based NoC architecture incorporating VFI-based power management achieves a maximum of 29.4% lower energy-delay-product (EDP) compared to the TSV-based designs for a large set of benchmarks. We also demonstrate that the M3D-based NoC shows up to 29.1% lower maximum temperature than the TSV-based counterpart for these benchmarks.
Earlier works observed that certain primary inputs have preferred values, which help increase the gate-level fault coverage when they appear in a functional test sequence. This paper observes that multiplexers present additional opportunities for increasing the fault coverage of a functional test sequence, which are not captured by preferred primary input values. Because multiplexers are prevalent, their effect on the fault coverage can be significant. A static analysis, that is independent of any functional test sequence, is performed in this paper to identify preferred values for the outputs of multiplexers. This is followed by a dynamic analysis that adjusts the select inputs of the multiplexers for a given functional test sequence in order to ensure that the preferred values appear on the outputs of the multiplexers more often. The analysis yields design-for-testability logic for the select inputs of the multiplexers that have preferred values. The logic is independent of the functional test sequence, and it allows the fault coverage to be increased when the select inputs are not primary inputs, or when the same select inputs are used for different multiplexers. Experimental results are presented to demonstrate that this approach has a significant effect on the fault coverage of functional test sequences.
Typical modern HW designs include many blocks associated with thousands of design properties. Having todays commercial formal verifiers utilize a complementary set of state-of-art formal algorithms is a key in enabling the formal verification tools to successfully cope with verification problems of different sizes, types and complexities. Formal engines orchestration is the methodology used to pick up the most appropriate formal engine for a specific verification problem. It assures proper scheduling of the formal engines to minimize the time consumed to solve individual design verification problems, hence highly impacts the time required to verify the overall design properties. This work proposes the utilization of supervised machine learning classification techniques to guide the orchestration step by predicting the formal engines that should be assigned to design property. Up to 16,500 formal verification runs on RTL designs and their properties are used to train the classifier to create a prediction model. The classifier assigns any new verification problem to an appropriate list of formal engines associated with a probability distribution over the set of engines classes. Our results indicate how the proposed model is able to improve the formal suite total run time by up to 98% of its maximum allowable time improvement using multi-classification based orchestration and to nominate with 88% accuracy the appropriate formal engines for new-to-verify HW designs.
Optimizing for routability during FPGA placement is becoming increasingly important, as failure to spread congestion throughout the chip, especially in the case of large designs, may result in placements that either cannot be routed, or that require the router to work excessively hard to obtain success. In this paper, we introduce a new, analytic routability-aware placement algorithm for Xilinx UltraScale FPGA architectures. The proposed algorithm, called GPlace3.0, seeks to optimize both wirelength and routability. Our work contains several unique features including a novel window-based procedure for satisfying legality constraints in lieu of packing, an accurate congestion estimation method based on modifications to the pathfinder global router, and a novel detailed placement algorithm that optimizes both wirelength and external pin count. Experimental results show that compared to the top three winners at the recent ISPD'16 FPGA placement contest, GPlace3.0 is able to achieve (on average) a 7.53%, 15.15%, and 33.50% reduction in routed wirelength, respectively, while requiring less overall runtime. As well, an additional 360 benchmarks were provided directly from Xilinx Inc. These benchmarks were used to compare GPlace3.0 to the most recently improved versions of the first and second place contest winners. Subsequent experimental results show that GPlace3.0 is able to outperform the improved placers in a variety of areas including number of best solutions found, fewest number of benchmarks that cannot be routed, runtime required to perform placement, and runtime required to perform routing.
Power consumption is identified as one of the main complications in designing practical wearable systems, mainly due to their stringent resource limitations. When designing wearable technologies, several system-level design choices, which directly contribute to the energy consumption of these systems, must be considered. In this paper, we propose a lightweight system optimization framework that trades-off power consumption and performance in connected wearable motion sensors. While existing approaches, exclusively focus on one or few specific design variables, our framework holistically finds the optimal power-performance solution with respect to the specified application need. This is formulated as a multi-variant non-convex optimization problem and therefore is hard to solve. To decrease the complexity, we propose a smoothing function to reduce this optimization to a convex problem. The reduced optimization is then solved in linear time using a devised derivative-free optimization approach, namely cyclic coordinate search. We evaluate our framework against several holistic optimization baselines using a real-world wearable activity recognition dataset. We minimize the energy consumption for various activity recognition performance thresholds ranging from 40% to 80% and demonstrate up to 64% energy saving.
Accounting for all operating conditions of a system at the design stage is typically infeasible for complex systems. Monitoring and verifying system requirements at runtime enables a system to continuously and introspectively ensure the system is operating correctly in the presence of dynamic execution scenarios. In this paper, we present a requirements-driven methodology enabling efficient runtime monitoring of embedded systems. The proposed approach extracts a runtime monitoring graph from system requirements specified using UML sequence diagrams. Non-intrusive, on-chip hardware dynamically monitors the system execution, verifies the execution adheres to the requirements model, and in the event of a failure provides detailed information that can be analyzed to determine the root cause. Using case studies of an autonomous vehicle and pacemaker prototypes, we analyze the relationship between event coverage, detection rate, and hardware requirements.
Real-time data analytics for smart-grid energy management is challenging with consideration of both occupant behavior profiles and energy profiles. This paper proposes a distributed and networked machine learning platform on smart-gateway based smart-grid. It can analyze occupant behaviors, provide short-term load forecasting and allocate renewable energy resources. Firstly, occupant behavior profile is captured by real-time indoor positioning system with WiFi data analytics; and the energy profile is extracted by real-time meter system with electricity load data analytics. Then, the 24-hour occupant behavior profile and energy profile are fused with prediction using an online distributed machine learning algorithm with real-time data update. Based on the forecasted occupant behavior profile and energy profile, solar energy source is allocated to reduce peak demand on the main electricity power-grid. The whole management flow can be operated on the distributed smart-gateway network with limited computational resources but with a supported general machine-learning engine. Experimental results on occupant behavior extraction show that the proposed algorithm can achieve 50 times and 38 times speed-up during data testing and training respectively with comparable indoor positioning accuracy, when compared to traditional support vector machine (SVM) method. Furthermore, for short-term load forecasting, it is 14.83% more accurate when compared to SVM based data analytics. Based on the predicted occupant behavior profile and energy profile, our proposed energy management system (EMS) can achieve 19.66% more peak load reduction and 26.41% more cost saving as compared to the SVM based method.
The development of cyber-physical systems and the Internet of Things (IoT) have a signifcant potential to improve the e ectiveness of assistive technologies for those with physical disabilities. To be practical, assistive systems should minimize the number of inputs from users, reducing cognitive and physical e ort required. This paper presents a versatile, energy-efcient framework and algorithm for assistive indoor navigation using an electric wheelchair and user inputs from multiple modalities. The proposed algorithm automates indoor navigation using only a few user commands captured through a wearable device, with the goal of simplifying navigation tasks and making them more instinctive for the user. We evaluated the proposed methodology using both a virtual smart building and a prototype built with o -the-shelf IoT development boards. Our evaluations for three di erent oorplans show one order of magnitude reduction in user e ort and communication energy required for navigation when compared with conventional navigation methodologies that require continuous user inputs.
In this article, we propose a novel approach to detect the occupancy behavior of a building through the temperature and/or possible heat source information, which can be used for energy reduction and security monitoring for emerging smart buildings. Our work is based on a realistic building simulation program, EnergyPlus, which can model the various time-series inputs to a building such as ambient temperature, heating, ventilation, and air-conditioning (HVAC) inputs. Two machine learning based approaches for detecting human occupancy of a smart building are applied herein, namely: support vector regression (SVR) method and recurrent neural network (RNN) method. Experimental results with SVR method show that 4-feature model provides accurate detection rate giving a 0.638 average error and 0.0532 error ratio, and 5-feature model gives a 0.317 average error and 0.0264 error ratio. This indicates that SVR is a viable option for occupancy detection. In RNN method, Elman's RNN (ELNN) can estimate occupancy information of each room of a building with high accuracy. The error level, in terms of number of people can be as low as 0.0056 on average and 0.288 at maximum considering ambient, room temperatures and HVAC powers as detectable information. Our study further shows both methods can deliver similar accuracy in the occupancy detection. But that SVR model is more stable for changing features of the system, while the RNN method can deliver more accuracy when the features used in the model do not change a lot.
Owing to high cell density caused by the advanced manufacturing process, the reliability of flash drives turns out to be rather challenging in flash system designs. In order to enhance the reliability of flash drives, error-correcting code (ECC) has been widely utilized in flash drives to correct error bits during programming/reading data to/from flash drives. Although ECC can effectively enhance the reliability of flash drives by correcting error bits, the capability of ECC would degrade while the program/erase (P/E) cycles of flash blocks is increased. Finally, ECC could not correct a flash page because a flash page contains too many error bits. As a result, reducing error bits is an effective solution to further improve the reliability of flash drives when a specific ECC is adopted in the flash drive. This work focuses on how to reduce the probability of producing error bits in a flash page. Thus, we propose a pattern-aware write strategy for flash reliability enhancement. The proposed write strategy considers both the P/E cycle of blocks and the pattern of written data while a flash block is allocated to store the written data. Since the proposed write strategy allocates young blocks (resp. old blocks) for hot data (resp. cold data) and flips the bit pattern of the written data to the appropriate bit pattern, the proposed strategy can effectively improve the reliability of flash drives. The experimental results show that the proposed strategy can reduce the number of error pages by up to 50\%, compared with the well-known DFTL solution. Moreover, the proposed strategy is orthogonal with all ECC mechanisms so that the reliability of the flash drives with ECC mechanisms can be further improved by the proposed strategy.
Recent studies in algorithmic microfluidics have led to the development of several techniques for automated solution preparation using droplet-based digital microfluidic (DMF) biochips. A major challenge in this direction is to produce a mixture of several reactants with a desired ratio while optimizing reactant-cost and preparation-time. The sequence of mix-split operations that are to be performed on the droplets is usually represented as a mixing-tree (or graph). In this paper, we present an efficient mixing algorithm namely Mixing Tree with Common Subtrees (MTCS) for preparing single-target mixtures. MTCS attempts to best-utilize intermediate droplets, which were otherwise wasted, and uses morphing based on permutation of leaf nodes to further reduce the graph-size. The technique can be generalized to produce multi-target ratios and we present another algorithm namely Multiple Target Ratios (MTR). Additionally, in order to enhance the output-load, we also propose an algorithm for droplet-streaming called Multi-Target Multi-Demand (MTMD). Simulation results on a large set of target-ratios show that MTCS can reduce the mean values of the total number of mix-split steps (Tms) and waste droplets (W) by 16% and 29% over Min-Mix (Thies et al. Natural Computing 2008), and by 22% and 34% over RMA (Roy et al. ACM TODAES 2015), respectively. Experimental results also suggest that MTR can reduce the average values of Tms and W by 23% and 44% over the repeated-version of Min-Mix, by 30% and 49% over the repeated-version of RMA, and by 9% and 22% over the repeated-version of MTCS, respectively. It is observed that MTMD can reduce the mean values of Tms and W by 64% and 85%, respectively, over MTR. Thus the proposed multi-target techniques MTR and MTMD provide efficient solutions to multiple-demand, multi-target mixture preparation on a DMF-platform.
The area required by combinational logic of a sequential circuit based on standard flip-flops can be reduced by identifying subcircuits that are identical. Pairs of matching subcircuits can then be replaced by circuits in which dual-edge-triggered flip-flops operate on multiplexed data at the rising and falling edges of the clock signal. We show how to modify the Boolean network describing a combinational logic to increase the opportunities for folding, without affecting its function. Experiments with benchmark circuits achieved an average reduction in circuit area of 18%.
Side channel attacks are a prominent threat to the security of embedded systems. To lead one, an adversary evaluates the goodness of fit of a key dependent model to the side channel measurements taken from an actual device, identifying the secret key value as the one yielding the best fitting model. We investigate the mapping between the models and the sources of information leakage in the CPU microarchitecture in a post map simulation environment and classify the leakage sources affecting different parts of the microarchitecture, Finally, we provide are able to provide hints to the software architect on potential vulnerabilities.
The weakest link in Cryptosystems is quite often due to the implementation rather than the mathematical underpinnings. A vast majority of attacks in the recent past have targeted programming flaws and bugs to break security systems. Due to the complexity, empirically verifying such systems is practically impossible, while manual verification as well as testing do not provide adequate guarantees. In this paper, we leverage model checking techniques to prove the functional correctness of an Elliptic curve cryptography (ECC) library with respect to its formal specification. We demonstrate how the huge state space of the C library can be aptly verified using a hierarchical assume-guarantee verification strategy. To test the scalability of this approach, we verify the correctness of four NIST-specified elliptic curve implementations. The smallest curve with a 192 bit prime field took 1 day to verify, while the largest curve with a 384 bit prime field took 8 days to verify.
Two novel layout generation methods of analog modules, a symmetrical twin-row-style for MOS transistors and a twisted common-centriod style for unit capacitors array, are introduced. Based on these limited layout styles and the reasonable algorithms, the symmetry and common-centriod placement patterns for analog devices are realized to guarantee the matching property. On this basis, as the most prominent contribution of this paper, the sensible channel routing based algorithms for two proposed layout styles could achieve the 100% routability due to the limited device placement and the corresponding lower routing complexity. The improved algorithms also bring benefits such as smaller layout area by maximizing the diffusion-sharing of MOS transistors, less routing layer usage for common-centroid devices array. Moreover, we apply our algorithms to layout designs with typical analog modules such as 2-stage operating amplifier and SAR-ADCs, generation results with the circuit simulations demonstrate the effectiveness of our algorithms in terms of the routability and matching property. These feasible and efficient algorithms can be also extended to apply varieties of essential MOS analog circuits simply.
Introduction to the Special Section on Advances in Physical Design Automation
Outsourcing of design and manufacturing processes makes integrated circuits (ICs) vulnerable to adversarial changes and raises concerns about their integrity. Reverse engineering the manufactured netlist helps identify malicious insertions. In this paper, we present an automated approach that, given a reference design description, infers high-level blocks in an untrusted gate-level (test) implementation. Using the structural connectivity of the netlists, we compute a geometric embedding for each wire in the circuits, which, then, is used to compute a bipartite matching between the nodes of the two designs and identify functional units in the test circuit. Experiments to evaluate the efficacy of the proposed technique on various-sized designs, including the multi-core processor OpenSparc T1, show that it can correctly match over 90% of gates in the test circuit to their corresponding block in the reference model.