Ordered escape routing is a critical issue in high-speed PCB routing. Differential pair and thermal-blockage-avoided are useful in PCB design to obtain high noise immunity and low electromagnetic interference. In this paper, a Min-cost Multi-commodity Flow (MMCF) approach is proposed to solve the ordered escape routing. First, the characteristic of grid pin array and staggered pin array is analyzed and then a basic network model is used to convert ordered escape routing to MMCF model. To satisfy the constraints of ordered escape routing, three novel transformations, such as non-crossing transformation, ordering transformation and capacity transformation, are used to convert the basic network model to the final correct MMCF model. After that, the differential pair in ordered escape routing is discussed. Finally, a method to deal with the blockage issue is proposed. Experimental results show that our method achieves 100% routability for all the test cases. The method can get both a feasible solution and an optimal solution for ordered escape routing. Compared to published approaches, our method improves in both wire length and CPU time remarkably. At the same time, the proposed method can effectively avoid the blockage and deal with the differential pair.
As designs continue to grow in size and complexity, EDA paradigm shifts from flat to hierarchical timing analysis. In this paper, we present compact and accurate timing macro modeling, which is the key to efficient and accurate hierarchical timing analysis. Our goal is to contain only a minimal amount of interface logic in our timing macro model. The main idea is to separate the interface logic into variant and constant timing regions. Then, the variant timing region is reserved for accuracy, while the constant timing region is reduced for compactness. For reducing the constant timing region, we propose anchor pin insertion and deletion by generalizing existing timing graph reduction techniques. Furthermore, we devise a lookup table index selection technique to achieve high model accuracy over the possible operating condition range. Compared with two common models used in industry, extracted timing model and interface logic model, our model has high model accuracy and small model size. Based on the TAU 2016 and 2017 timing macro modeling contest benchmark suites, our results show that our algorithm delivers superior efficiency and accuracy: Hierarchical timing analysis using our model can significantly reduce runtime and memory compared with flat timing analysis on the original design. Moreover, our algorithm outperforms TAU 2016 and 2017 contest winners in model accuracy, model size, model generation performance, and model usage performance.
Energy harvesting is becoming a favorable alternative to power future generation embedded systems since it is more environmental and user friendly. However, energy harvesting powered embedded systems suffer from frequent execution interruption due to unstable energy supply. To tackle this problem, non-volatile memory has been deployed to save the whole volatile state for computation. When power resumes, the processor can restore the state back to volatile memories and continue execution. However, without careful consideration, the process of checkpointing and resuming could cause inconsistency between volatile and non-volatile memories which leads to irreversible errors. In this paper, we propose a consistency aware adaptive checkpointing scheme that ensures correctness for all checkpoints. The proposed technique efficiently identifies all possible inconsistency positions in programs and inserts auxiliary code to ensure correctness by offline analysis. Besides, adaptive checkpointing assisted register file profiling and online tracking techniques further reduces the overhead of each checkpoint. Evaluation results show that the proposed checkpointing schematic can successfully eliminate inconsistency errors and greatly reduce the checkpointing overhead.
Parallel on-chip voltage regulation, where multiple regulators are connected to the same power grid, has recently attracted significant attention with the proliferation of small on-chip voltage regulators. In this paper, the number, size, and location of parallel low-dropout (LDO) regulators and intentional decoupling capacitors are optimized using mixed integer non-linear programming (MINLP) formulation. The proposed optimization function concurrently considers multiple objectives such as area, power noise, and overall power consumption. Certain objectives are optimized by putting constraints to the other objectives with the proposed technique. Additional constraints have been added to avoid the overlap of LDOs and decoupling capacitors in the optimization process. The results of an optimized LDO allocation in the POWER8 chip is compared with the recent LDO allocation in the same IBM chip in a case study where a 20% reduction in the noise is achieved. The results of the proposed multi-criteria objective function under a different area, power, and noise constraints are also evaluated with a sample ISPD'11 benchmark circuits in another case study.
The legalization step is performed after global placement where wire length and routability are optimized or during timing optimization where buffer insertion or gate sizing are applied to meet timing requirements. Therefore, an ideal legalization approach must preserve the quality of the input placement in terms of routability, wire length, and timing constraints. These requirements indirectly impose maximum and average cell movement constraints during legalization. In addition, the legalization step should effectively manage white space availability with a highly efficient runtime in order to be used in an iterative process such as timing optimization. In this article, a robust and fast legalization method called Eh?Legalizer for standard-cell placement is presented. Eh?Legalizer legalizes input placements while minimizing the maximum and average cell movements using a highly efficient novel network flow-based approach. In contrast to the traditional network flow based legalizers, areas with high cell utilizations are effectively legalized by finding several candidate paths and there is no need for a post-process step. The experimental results conducted on several benchmarks show that Eh?Legalizer results in 2.5 times and 3.3 times less the maximum and average cell movement, respectively, while its runtime is significantly (18x) lower compared to traditional legalizers. In addition, the experimental results illustrate the scalability and robustness of Eh?Legalizer with respect to the floorplan complexity. Finally, the detailed-routing results show detailed routing violations are reduced on average by 23% when Eh?Legalizer is used to generate legal solutions.
Owing to existing intellectual properties (IPs), pre-routed nets, and power/ground wires, the routing of a System on Chip (SoC) design demands to detour around multi-layer (ML) obstacles. Maze routing has been applied to solve practical routing constraints; however, the much higher complexity of a multi-layer routing graph than that of a single-layer (SL) routing graph increases the required runtime of conducting maze routing on a multi-layer routing graph. Traditional approaches for the ML obstacle-avoiding rectilinear Steiner tree (ML-OARST) problem are thus non-maze routing based approaches for runtime issue; on the other hand, they cannot be directly applied to deal with additional constraints such as variant edge weights on a routing layer. In this paper, we propose the first maze routing-based methodology with bounded exploration and path-assessed retracing to reduce runtime and routing cost for the constrained ML-OARST construction problem. The exploration of maze routing is bounded to reduce the runtime; the costs of connecting pins are computed to select Steiner points in the retracing phase. To further reduce the routing cost, we develop a Steiner point-based ripping-up and re-building scheme for altering tree topology. Experimental results on industrial and randomly generated benchmarks demonstrate that the proposed methodology can provide a solution with good quality in terms of routing cost and has a significant speed-up compared to the traditional maze routing. A commercial tool is also used to show the effectiveness of the proposed methodology.
As the conventional silicon-based CMOS technology marches towards the sub-10nm region, the problem of high leakage power becomes increasingly serious. Under this circumstance, the carbon-nanotube field effect transistors (CNFETs) emerge as a promising alternative to the conventional CMOS devices. However, they experience much larger variability than the CMOS devices, which results in a large circuit delay variation and hence, a significant timing yield loss. One of the main variation sources is the carbon-nanotube (CNT) density variation. However, it shows a special property not existing for CMOS devices, namely the asymmetric spatial correlation. In this work, we propose a novel global placement algorithm to reduce the timing yield loss caused by the CNT density variation. We apply a statistical timing characterization to the CNFET standard cells for estimating the delay of the circuit under CNT density variation. Then, we apply a segment-based placement strategy to reduce the delays of the statistically critical paths. Experimental results demonstrated that our approach effectively improves the timing yield.
Modern field-programmable gate array (FPGA) devices contain complex clock architectures on top of configurable logics. Unlike application specific integrated circuits (ASICs), the physical structure of clock networks in an FPGA is pre-manufactured and cannot be adjusted to different applications. Furthermore, clock routing resources are typically limited for high-utilization designs. Consequently, clock architectures impose extra clock constraints and further complicate physical synthesis tasks such as placement. Traditional ASIC placement techniques only optimize conventional design metrics such as wirelength, routability, power, and timing without clock legality consideration. It is imperative to have new techniques to honor clock constraints during placement for FPGAs. In this paper, we propose a high-performance FPGA placement engine called UTPlaceF 2.0 that optimizes wirelength and routability while honoring complex clock constraints. Our proposed approaches consist of an iterative minimum-cost-flow-based cell assignment as well as a clock-aware packing for producing clock-legal yet high-quality placement solutions. UTPlaceF 2.0 won the first place in ISPD17 clock-aware FPGA placement contest organized by Xilinx, outperforming the second- and the third-place winners by 4.0% and 10.0%, respectively, in routed wirelength with competitive runtime, on a set of industry benchmarks.
As computer architectures evolve, guaranteeing that Real-Time Systems' (RTSs') timing requirements are met through Worst Case Execution Time (WCET) upper bounds becomes increasingly difficult. Techniques such as Measurement-Based Probabilistic Timing Analysis (MBPTA) have emerged to estimate WCET bounds through Extreme Value Theory (EVT), a statistics branch designed to associate probabilities to extreme events. MBPTA aims providing WCET bounds exceeded only with arbitrarily low probabilities (i.e. pWCETs), for what the maximum observed execution times are expected to adhere to one of the extreme value models EVT uses. The Peaks Over Threshold (POT) approach for applying EVT involves adjusting a tail-shaped distribution, e.g. Generalized Pareto (GP) or Exponential, to the values that exceed a carefully selected high threshold. In this regard, several works suggest that GP should be adjusted for best representing different tail shapes, while others consider the Exponential model more adequate for providing upper bounds with increased reliability. In this work we perform empirical reliability and tightness evaluations of the pWCET estimates yielded by the GP and Exponential models while applying MBPTA through the POT approach. For that we compare, as modelling sample sizes are increased, the produced estimates and their confidence intervals with the highest values effectively observed on large validation samples (e.g. of size 108) of both real and synthetic nature.
Yield analysis of SRAM is a challenging issue because the failure rates of SRAM cells are extremely small. In this paper, an efficient non-Gaussian sampling method of cross entropy optimization is proposed for estimating the high sigma SRAM yield. Instead of sampling with the Gaussian distribution in existing methods, a non-Gaussian distribution, i.e. a joint 1-D generalized Pareto distribution (GPD) and (n-1)-D Gaussian distribution is taken as the function family of practical distribution, which is proved to be more suitable to fit the ideal distribution in the view of extreme failure event. To minimize the cross entropy between practical and ideal distributions, a sequential quadratic programing (SQP) solver with multiple starting points (MSP) strategy is applied for calculating the optimal parameters of practical distributions. Experimental results show that the proposed non-Gaussian sampling is 2.2-4.1x speedup over the Gaussian sampling, and in a whole it is about 1.6-2.3x speedup over state-of-the-art methods with low and high dimensional cases without loss of accuracy.
This paper presents a novel approach to overcome the challenges of third-party IP-integration based on software-defined System-on-Chips (SoC) and Graph Grammars. The IP-supplier prepares a HW-accelerated software library (HASL) for the SoC architect. As a key point of our approach, HW integration knowledge is encoded in the library as a set of rules. These rules are defined in the machine-readable standardized IP-XACT IEEE 1685-2009 format. The library preparation step also includes the generation of configurable HW drivers, schedulers and the software library functions. For the SoC architect, we have developed the graph-grammar-based IP-integration (GRIP) tool. The software application is developed using the functions supplied in the HASL. According to the utilized functions, the GRIP tool automatically integrates IP-blocks using the rule information supplied with the library. The SoC architecture and rules are transformed into the graph domain to apply graph rewriting methods. Iterative rule application spans a design space search tree with multiple candidate SoC architectures for a targeted software-defined SoC. The GRIP tool is model-driven and based on the Eclipse Modeling Framework (EMF). Applying code generation techniques, SoC candidate architectures can be transformed to hardware descriptions for the target platform. The HW/SW interfaces between SW library functions and IP blocks can be automatically generated for bare-metal or Linux-based applications. The approach is demonstrated with a case-study on the Xilinx Zynq-based ZedBoard evaluation board using a HASL for computer vision, which yielded 150x performance improvement. The generated drivers overhead is only 0.28% compared to the manually-written drivers.
The new generation of electrowetting-on-dielectric (EWOD) based digital microfluidic chips deploys Active-Matrix (AM) architecture that consists of a dense array of micro-electrodes, which are much smaller in size than those used in conventional EWOD biochips. Such AM-EWOD architecture offers many advantages including the capability of handling variable-size droplets, more flexible droplet movement, and precise control over droplet navigation. In this paper, we propose the first algorithm to determine droplet routing paths on an AM-EWOD chip that allows splitting of a droplet into smaller droplets with variable sizes, reshaping them depending on the congestion of the intended path, and finally merging them to recover the original droplet at the destination. The proposed negotiation-based routing algorithm optimizes the routing cost and provides more freedom in deadlock avoidance in the presence of multiple routing tasks compared to existing techniques. For a number of benchmark and random test suites, it reduces latest-arrival-time by an average of 29%. Furthermore, our method is observed to provide 100% routability of nets for all test cases, whereas existing and Baseline routers fail to produce feasible solutions for many instances. We also propose a reliable-mode droplet routing strategy where the number of unbalanced splitting operations can be reduced significantly by paying small penalty on latest-arrival-time.
Two novel layout generation methods of analog modules, a symmetrical twin-row-style for MOS transistors and a twisted common-centriod style for unit capacitors array, are introduced. Based on these limited layout styles and the reasonable algorithms, the symmetry and common-centriod placement patterns for analog devices are realized to guarantee the matching property. On this basis, as the most prominent contribution of this paper, the sensible channel routing based algorithms for two proposed layout styles could achieve the 100% routability due to the limited device placement and the corresponding lower routing complexity. The improved algorithms also bring benefits such as smaller layout area by maximizing the diffusion-sharing of MOS transistors, less routing layer usage for common-centroid devices array. Moreover, we apply our algorithms to layout designs with typical analog modules such as 2-stage operating amplifier and SAR-ADCs, generation results with the circuit simulations demonstrate the effectiveness of our algorithms in terms of the routability and matching property. These feasible and efficient algorithms can be also extended to apply varieties of essential MOS analog circuits simply.