Logic encryption is an IC protection technique for preventing an IC design from overproduction and unauthorized use. It hides a designs functionality by inserting key gates and key inputs, such that a secret key is required to activate the design for functioning correctly. The security of a logic encryption algorithm is evaluated according to the difficulty of cracking the secret key. The state-of-the-art attack method identifies a secret key with a series of SAT solving calls to prune all the incorrect keys. Although it can break most of the existing logic encryption algorithms within a few hours, we observe that there exist two enhancements for increasing its efficiency. First, we introduce a pre-process to identify and eliminate redundant key inputs for simplifying the SAT problems. Second, we present a key checking process for increasing the pruned incorrect keys in each SAT solving iteration. We conducted the experiments on a set of benchmark circuits encrypted by six different logic encryption algorithms. The results show that the enhanced method can successfully unlock 10 benchmark circuits which originally cannot be cracked within 1 hour. For all the benchmark circuits, the average speedup is approximately 2.2x. Furthermore, a recent logic encryption method locks a design by creating cyclic paths, which can invalidate the SAT-based attack method. We analyze the impact of the cyclic paths and propose an enhancement to break the cyclic logic encryption method.
Ordered escape routing is a critical issue in high-speed PCB routing. Differential pair and thermal-blockage-avoided are useful in PCB design to obtain high noise immunity and low electromagnetic interference. In this paper, a Min-cost Multi-commodity Flow (MMCF) approach is proposed to solve the ordered escape routing. First, the characteristic of grid pin array and staggered pin array is analyzed and then a basic network model is used to convert ordered escape routing to MMCF model. To satisfy the constraints of ordered escape routing, three novel transformations, such as non-crossing transformation, ordering transformation and capacity transformation, are used to convert the basic network model to the final correct MMCF model. After that, the differential pair in ordered escape routing is discussed. Finally, a method to deal with the blockage issue is proposed. Experimental results show that our method achieves 100% routability for all the test cases. The method can get both a feasible solution and an optimal solution for ordered escape routing. Compared to published approaches, our method improves in both wire length and CPU time remarkably. At the same time, the proposed method can effectively avoid the blockage and deal with the differential pair.
Close-to-functional scan-based tests are expected to create close-to-functional operation conditions in order to avoid overtesting of delay faults. Existing metrics for the proximity to functional operation conditions are based on the scan-in state. For example, they consider the distance between the scan-in state and a reachable state (a state that the circuit can visit during functional operation). However, the deviation from functional operation conditions can increase during a test beyond the deviation that is measured by the scan-in state. To ensure that the deviation does not increase, this paper introduces the concept of a partially-invariant pattern. The paper describes a procedure for extracting partially-invariant patterns from functional broadside tests whose scan-in states are reachable states. Being partially-specified, partially-invariant patterns are suitable for test data compression. The paper studies the use of partially-invariant patterns for LFSR-based test data compression. Noting that a seed may not exist for a given partially-invariant pattern with a given LFSR, the procedure described in this paper uses an iterative process that not only matches a seed to a partially-invariant pattern, but also adjusts the partially-invariant pattern based on the test that the seed produces. The paper also addresses the selection of LFSRs for the generation of close-to-functional broadside tests based on partially-invariant patterns. Experimental results are presented to demonstrate the feasibility of the procedure.
The rise of mixed-critical embedded systems imposes novel challenges on the specification, development, and functional validation in a design flow. In the emerging dynamic scheduling context of mixed-criticality platforms, the system behaviour needs to be estimated in an early step in the design flow to assess the integration impact, especially for quality of service-driven, low-critical subsystems. We provide a modelling and integration flow for specifying, estimating, and evaluating software functions, ranging from an initial executable specification to an implementation candidate on an MPSoC. Based on a data-driven model to evaluate dynamic resource consumption effects of high-critical subsystems and the scheduling overhead, we propose a systematic method for constructing workload models of high-critical software components on the target. Our proxies provide an integration environment for low-critical functions by mimicking the high-critical temporal behaviour on the target. By integrating a low-critical video encoding subsystem with a benchmark suite as the high-critical subsystem we show that the performance model allows for evaluating end-to-end execution times in the low-critical function with an average error of 0.37 % and the application proxy only introduces a maximum error of 1.14 % in a performance evaluation.
Parallel on-chip voltage regulation, where multiple regulators are connected to the same power grid, has recently attracted significant attention with the proliferation of small on-chip voltage regulators. In this paper, the number, size, and location of parallel low-dropout (LDO) regulators and intentional decoupling capacitors are optimized using mixed integer non-linear programming (MINLP) formulation. The proposed optimization function concurrently considers multiple objectives such as area, power noise, and overall power consumption. Certain objectives are optimized by putting constraints to the other objectives with the proposed technique. Additional constraints have been added to avoid the overlap of LDOs and decoupling capacitors in the optimization process. The results of an optimized LDO allocation in the POWER8 chip is compared with the recent LDO allocation in the same IBM chip in a case study where a 20% reduction in the noise is achieved. The results of the proposed multi-criteria objective function under a different area, power, and noise constraints are also evaluated with a sample ISPD'11 benchmark circuits in another case study.
The legalization step is performed after global placement where wire length and routability are optimized or during timing optimization where buffer insertion or gate sizing are applied to meet timing requirements. Therefore, an ideal legalization approach must preserve the quality of the input placement in terms of routability, wire length, and timing constraints. These requirements indirectly impose maximum and average cell movement constraints during legalization. In addition, the legalization step should effectively manage white space availability with a highly efficient runtime in order to be used in an iterative process such as timing optimization. In this article, a robust and fast legalization method called Eh?Legalizer for standard-cell placement is presented. Eh?Legalizer legalizes input placements while minimizing the maximum and average cell movements using a highly efficient novel network flow-based approach. In contrast to the traditional network flow based legalizers, areas with high cell utilizations are effectively legalized by finding several candidate paths and there is no need for a post-process step. The experimental results conducted on several benchmarks show that Eh?Legalizer results in 2.5 times and 3.3 times less the maximum and average cell movement, respectively, while its runtime is significantly (18x) lower compared to traditional legalizers. In addition, the experimental results illustrate the scalability and robustness of Eh?Legalizer with respect to the floorplan complexity. Finally, the detailed-routing results show detailed routing violations are reduced on average by 23% when Eh?Legalizer is used to generate legal solutions.
Owing to existing intellectual properties (IPs), pre-routed nets, and power/ground wires, the routing of a System on Chip (SoC) design demands to detour around multi-layer (ML) obstacles. Maze routing has been applied to solve practical routing constraints; however, the much higher complexity of a multi-layer routing graph than that of a single-layer (SL) routing graph increases the required runtime of conducting maze routing on a multi-layer routing graph. Traditional approaches for the ML obstacle-avoiding rectilinear Steiner tree (ML-OARST) problem are thus non-maze routing based approaches for runtime issue; on the other hand, they cannot be directly applied to deal with additional constraints such as variant edge weights on a routing layer. In this paper, we propose the first maze routing-based methodology with bounded exploration and path-assessed retracing to reduce runtime and routing cost for the constrained ML-OARST construction problem. The exploration of maze routing is bounded to reduce the runtime; the costs of connecting pins are computed to select Steiner points in the retracing phase. To further reduce the routing cost, we develop a Steiner point-based ripping-up and re-building scheme for altering tree topology. Experimental results on industrial and randomly generated benchmarks demonstrate that the proposed methodology can provide a solution with good quality in terms of routing cost and has a significant speed-up compared to the traditional maze routing. A commercial tool is also used to show the effectiveness of the proposed methodology.
As the conventional silicon-based CMOS technology marches towards the sub-10nm region, the problem of high leakage power becomes increasingly serious. Under this circumstance, the carbon-nanotube field effect transistors (CNFETs) emerge as a promising alternative to the conventional CMOS devices. However, they experience much larger variability than the CMOS devices, which results in a large circuit delay variation and hence, a significant timing yield loss. One of the main variation sources is the carbon-nanotube (CNT) density variation. However, it shows a special property not existing for CMOS devices, namely the asymmetric spatial correlation. In this work, we propose a novel global placement algorithm to reduce the timing yield loss caused by the CNT density variation. We apply a statistical timing characterization to the CNFET standard cells for estimating the delay of the circuit under CNT density variation. Then, we apply a segment-based placement strategy to reduce the delays of the statistically critical paths. Experimental results demonstrated that our approach effectively improves the timing yield.
Modern field-programmable gate array (FPGA) devices contain complex clock architectures on top of configurable logics. Unlike application specific integrated circuits (ASICs), the physical structure of clock networks in an FPGA is pre-manufactured and cannot be adjusted to different applications. Furthermore, clock routing resources are typically limited for high-utilization designs. Consequently, clock architectures impose extra clock constraints and further complicate physical synthesis tasks such as placement. Traditional ASIC placement techniques only optimize conventional design metrics such as wirelength, routability, power, and timing without clock legality consideration. It is imperative to have new techniques to honor clock constraints during placement for FPGAs. In this paper, we propose a high-performance FPGA placement engine called UTPlaceF 2.0 that optimizes wirelength and routability while honoring complex clock constraints. Our proposed approaches consist of an iterative minimum-cost-flow-based cell assignment as well as a clock-aware packing for producing clock-legal yet high-quality placement solutions. UTPlaceF 2.0 won the first place in ISPD17 clock-aware FPGA placement contest organized by Xilinx, outperforming the second- and the third-place winners by 4.0% and 10.0%, respectively, in routed wirelength with competitive runtime, on a set of industry benchmarks.
Power consumption is identified as one of the main complications in designing practical wearable systems, mainly due to their stringent resource limitations. When designing wearable technologies, several system-level design choices, which directly contribute to the energy consumption of these systems, must be considered. In this paper, we propose a lightweight system optimization framework that trades-off power consumption and performance in connected wearable motion sensors. While existing approaches, exclusively focus on one or few specific design variables, our framework holistically finds the optimal power-performance solution with respect to the specified application need. This is formulated as a multi-variant non-convex optimization problem and therefore is hard to solve. To decrease the complexity, we propose a smoothing function to reduce this optimization to a convex problem. The reduced optimization is then solved in linear time using a devised derivative-free optimization approach, namely cyclic coordinate search. We evaluate our framework against several holistic optimization baselines using a real-world wearable activity recognition dataset. We minimize the energy consumption for various activity recognition performance thresholds ranging from 40% to 80% and demonstrate up to 64% energy saving.
Accounting for all operating conditions of a system at the design stage is typically infeasible for complex systems. Monitoring and verifying system requirements at runtime enables a system to continuously and introspectively ensure the system is operating correctly in the presence of dynamic execution scenarios. In this paper, we present a requirements-driven methodology enabling efficient runtime monitoring of embedded systems. The proposed approach extracts a runtime monitoring graph from system requirements specified using UML sequence diagrams. Non-intrusive, on-chip hardware dynamically monitors the system execution, verifies the execution adheres to the requirements model, and in the event of a failure provides detailed information that can be analyzed to determine the root cause. Using case studies of an autonomous vehicle and pacemaker prototypes, we analyze the relationship between event coverage, detection rate, and hardware requirements.
In this article, we propose a novel approach to detect the occupancy behavior of a building through the temperature and/or possible heat source information, which can be used for energy reduction and security monitoring for emerging smart buildings. Our work is based on a realistic building simulation program, EnergyPlus, which can model the various time-series inputs to a building such as ambient temperature, heating, ventilation, and air-conditioning (HVAC) inputs. Two machine learning based approaches for detecting human occupancy of a smart building are applied herein, namely: support vector regression (SVR) method and recurrent neural network (RNN) method. Experimental results with SVR method show that 4-feature model provides accurate detection rate giving a 0.638 average error and 0.0532 error ratio, and 5-feature model gives a 0.317 average error and 0.0264 error ratio. This indicates that SVR is a viable option for occupancy detection. In RNN method, Elman's RNN (ELNN) can estimate occupancy information of each room of a building with high accuracy. The error level, in terms of number of people can be as low as 0.0056 on average and 0.288 at maximum considering ambient, room temperatures and HVAC powers as detectable information. Our study further shows both methods can deliver similar accuracy in the occupancy detection. But that SVR model is more stable for changing features of the system, while the RNN method can deliver more accuracy when the features used in the model do not change a lot.
Recent studies in algorithmic microfluidics have led to the development of several techniques for automated solution preparation using droplet-based digital microfluidic (DMF) biochips. A major challenge in this direction is to produce a mixture of several reactants with a desired ratio while optimizing reactant-cost and preparation-time. The sequence of mix-split operations that are to be performed on the droplets is usually represented as a mixing-tree (or graph). In this paper, we present an efficient mixing algorithm namely Mixing Tree with Common Subtrees (MTCS) for preparing single-target mixtures. MTCS attempts to best-utilize intermediate droplets, which were otherwise wasted, and uses morphing based on permutation of leaf nodes to further reduce the graph-size. The technique can be generalized to produce multi-target ratios and we present another algorithm namely Multiple Target Ratios (MTR). Additionally, in order to enhance the output-load, we also propose an algorithm for droplet-streaming called Multi-Target Multi-Demand (MTMD). Simulation results on a large set of target-ratios show that MTCS can reduce the mean values of the total number of mix-split steps (Tms) and waste droplets (W) by 16% and 29% over Min-Mix (Thies et al. Natural Computing 2008), and by 22% and 34% over RMA (Roy et al. ACM TODAES 2015), respectively. Experimental results also suggest that MTR can reduce the average values of Tms and W by 23% and 44% over the repeated-version of Min-Mix, by 30% and 49% over the repeated-version of RMA, and by 9% and 22% over the repeated-version of MTCS, respectively. It is observed that MTMD can reduce the mean values of Tms and W by 64% and 85%, respectively, over MTR. Thus the proposed multi-target techniques MTR and MTMD provide efficient solutions to multiple-demand, multi-target mixture preparation on a DMF-platform.
Two novel layout generation methods of analog modules, a symmetrical twin-row-style for MOS transistors and a twisted common-centriod style for unit capacitors array, are introduced. Based on these limited layout styles and the reasonable algorithms, the symmetry and common-centriod placement patterns for analog devices are realized to guarantee the matching property. On this basis, as the most prominent contribution of this paper, the sensible channel routing based algorithms for two proposed layout styles could achieve the 100% routability due to the limited device placement and the corresponding lower routing complexity. The improved algorithms also bring benefits such as smaller layout area by maximizing the diffusion-sharing of MOS transistors, less routing layer usage for common-centroid devices array. Moreover, we apply our algorithms to layout designs with typical analog modules such as 2-stage operating amplifier and SAR-ADCs, generation results with the circuit simulations demonstrate the effectiveness of our algorithms in terms of the routability and matching property. These feasible and efficient algorithms can be also extended to apply varieties of essential MOS analog circuits simply.
Introduction to the Special Section on Advances in Physical Design Automation
Outsourcing of design and manufacturing processes makes integrated circuits (ICs) vulnerable to adversarial changes and raises concerns about their integrity. Reverse engineering the manufactured netlist helps identify malicious insertions. In this paper, we present an automated approach that, given a reference design description, infers high-level blocks in an untrusted gate-level (test) implementation. Using the structural connectivity of the netlists, we compute a geometric embedding for each wire in the circuits, which, then, is used to compute a bipartite matching between the nodes of the two designs and identify functional units in the test circuit. Experiments to evaluate the efficacy of the proposed technique on various-sized designs, including the multi-core processor OpenSparc T1, show that it can correctly match over 90% of gates in the test circuit to their corresponding block in the reference model.