Three-dimensional (3D) integration enables the design of high-performance and energy efficient network on chip (NoC) architectures as communication backbones for manycore chips. To exploit the benefits of the vertical dimension of 3D integration, through-silicon-via (TSV) has been predominantly used in state-of-the-art manycore chip design. However, for TSV-based systems, high power density and the resultant thermal hotspot remain major concerns from the perspectives of chip functionality and overall reliability. The power consumption and thermal profiles of 3D NoCs can be improved by incorporating a Voltage-Frequency-Island (VFI)-based power management strategy. However, due to inherent thermal constraints of a TSV-based 3D system, we are unable to fully exploit the benefits offered by the power management methodology. In this context, emergence of monolithic 3D (M3D) integration has opened up new possibility of designing ultra-low-power and high-performance circuits and systems. The smaller dimensions of the inter-layer dielectric (ILD) and monolithic inter-tier vias (MIVs) offer high-density integration, flexibility of partitioning logic blocks across multiple tiers, and significant reduction of total wire-length. In this work, we present the first-ever study of the performance-thermal trade-offs for energy efficient monolithic 3D manycore chips. In particular, we present a comparative performance evaluation of M3D NoCs with respect to their conventional TSV-based counterparts. We demonstrate that the proposed M3D-based NoC architecture incorporating VFI-based power management achieves a maximum of 29.4% lower energy-delay-product (EDP) compared to the TSV-based designs for a large set of benchmarks. We also demonstrate that the M3D-based NoC shows up to 29.1% lower maximum temperature than the TSV-based counterpart for these benchmarks.
Close-to-functional scan-based tests are expected to create close-to-functional operation conditions in order to avoid overtesting of delay faults. Existing metrics for the proximity to functional operation conditions are based on the scan-in state. For example, they consider the distance between the scan-in state and a reachable state (a state that the circuit can visit during functional operation). However, the deviation from functional operation conditions can increase during a test beyond the deviation that is measured by the scan-in state. To ensure that the deviation does not increase, this paper introduces the concept of a partially-invariant pattern. The paper describes a procedure for extracting partially-invariant patterns from functional broadside tests whose scan-in states are reachable states. Being partially-specified, partially-invariant patterns are suitable for test data compression. The paper studies the use of partially-invariant patterns for LFSR-based test data compression. Noting that a seed may not exist for a given partially-invariant pattern with a given LFSR, the procedure described in this paper uses an iterative process that not only matches a seed to a partially-invariant pattern, but also adjusts the partially-invariant pattern based on the test that the seed produces. The paper also addresses the selection of LFSRs for the generation of close-to-functional broadside tests based on partially-invariant patterns. Experimental results are presented to demonstrate the feasibility of the procedure.
Earlier works observed that certain primary inputs have preferred values, which help increase the gate-level fault coverage when they appear in a functional test sequence. This paper observes that multiplexers present additional opportunities for increasing the fault coverage of a functional test sequence, which are not captured by preferred primary input values. Because multiplexers are prevalent, their effect on the fault coverage can be significant. A static analysis, that is independent of any functional test sequence, is performed in this paper to identify preferred values for the outputs of multiplexers. This is followed by a dynamic analysis that adjusts the select inputs of the multiplexers for a given functional test sequence in order to ensure that the preferred values appear on the outputs of the multiplexers more often. The analysis yields design-for-testability logic for the select inputs of the multiplexers that have preferred values. The logic is independent of the functional test sequence, and it allows the fault coverage to be increased when the select inputs are not primary inputs, or when the same select inputs are used for different multiplexers. Experimental results are presented to demonstrate that this approach has a significant effect on the fault coverage of functional test sequences.
Typical modern HW designs include many blocks associated with thousands of design properties. Having todays commercial formal verifiers utilize a complementary set of state-of-art formal algorithms is a key in enabling the formal verification tools to successfully cope with verification problems of different sizes, types and complexities. Formal engines orchestration is the methodology used to pick up the most appropriate formal engine for a specific verification problem. It assures proper scheduling of the formal engines to minimize the time consumed to solve individual design verification problems, hence highly impacts the time required to verify the overall design properties. This work proposes the utilization of supervised machine learning classification techniques to guide the orchestration step by predicting the formal engines that should be assigned to design property. Up to 16,500 formal verification runs on RTL designs and their properties are used to train the classifier to create a prediction model. The classifier assigns any new verification problem to an appropriate list of formal engines associated with a probability distribution over the set of engines classes. Our results indicate how the proposed model is able to improve the formal suite total run time by up to 98% of its maximum allowable time improvement using multi-classification based orchestration and to nominate with 88% accuracy the appropriate formal engines for new-to-verify HW designs.
As the conventional silicon-based CMOS technology marches towards the sub-10nm region, the problem of high leakage power becomes increasingly serious. Under this circumstance, the carbon-nanotube field effect transistors (CNFETs) emerge as a promising alternative to the conventional CMOS devices. However, they experience much larger variability than the CMOS devices, which results in a large circuit delay variation and hence, a significant timing yield loss. One of the main variation sources is the carbon-nanotube (CNT) density variation. However, it shows a special property not existing for CMOS devices, namely the asymmetric spatial correlation. In this work, we propose a novel global placement algorithm to reduce the timing yield loss caused by the CNT density variation. We apply a statistical timing characterization to the CNFET standard cells for estimating the delay of the circuit under CNT density variation. Then, we apply a segment-based placement strategy to reduce the delays of the statistically critical paths. Experimental results demonstrated that our approach effectively improves the timing yield.
Power consumption is identified as one of the main complications in designing practical wearable systems, mainly due to their stringent resource limitations. When designing wearable technologies, several system-level design choices, which directly contribute to the energy consumption of these systems, must be considered. In this paper, we propose a lightweight system optimization framework that trades-off power consumption and performance in connected wearable motion sensors. While existing approaches, exclusively focus on one or few specific design variables, our framework holistically finds the optimal power-performance solution with respect to the specified application need. This is formulated as a multi-variant non-convex optimization problem and therefore is hard to solve. To decrease the complexity, we propose a smoothing function to reduce this optimization to a convex problem. The reduced optimization is then solved in linear time using a devised derivative-free optimization approach, namely cyclic coordinate search. We evaluate our framework against several holistic optimization baselines using a real-world wearable activity recognition dataset. We minimize the energy consumption for various activity recognition performance thresholds ranging from 40% to 80% and demonstrate up to 64% energy saving.
Accounting for all operating conditions of a system at the design stage is typically infeasible for complex systems. Monitoring and verifying system requirements at runtime enables a system to continuously and introspectively ensure the system is operating correctly in the presence of dynamic execution scenarios. In this paper, we present a requirements-driven methodology enabling efficient runtime monitoring of embedded systems. The proposed approach extracts a runtime monitoring graph from system requirements specified using UML sequence diagrams. Non-intrusive, on-chip hardware dynamically monitors the system execution, verifies the execution adheres to the requirements model, and in the event of a failure provides detailed information that can be analyzed to determine the root cause. Using case studies of an autonomous vehicle and pacemaker prototypes, we analyze the relationship between event coverage, detection rate, and hardware requirements.
Real-time data analytics for smart-grid energy management is challenging with consideration of both occupant behavior profiles and energy profiles. This paper proposes a distributed and networked machine learning platform on smart-gateway based smart-grid. It can analyze occupant behaviors, provide short-term load forecasting and allocate renewable energy resources. Firstly, occupant behavior profile is captured by real-time indoor positioning system with WiFi data analytics; and the energy profile is extracted by real-time meter system with electricity load data analytics. Then, the 24-hour occupant behavior profile and energy profile are fused with prediction using an online distributed machine learning algorithm with real-time data update. Based on the forecasted occupant behavior profile and energy profile, solar energy source is allocated to reduce peak demand on the main electricity power-grid. The whole management flow can be operated on the distributed smart-gateway network with limited computational resources but with a supported general machine-learning engine. Experimental results on occupant behavior extraction show that the proposed algorithm can achieve 50 times and 38 times speed-up during data testing and training respectively with comparable indoor positioning accuracy, when compared to traditional support vector machine (SVM) method. Furthermore, for short-term load forecasting, it is 14.83% more accurate when compared to SVM based data analytics. Based on the predicted occupant behavior profile and energy profile, our proposed energy management system (EMS) can achieve 19.66% more peak load reduction and 26.41% more cost saving as compared to the SVM based method.
The development of cyber-physical systems and the Internet of Things (IoT) have a signifcant potential to improve the e ectiveness of assistive technologies for those with physical disabilities. To be practical, assistive systems should minimize the number of inputs from users, reducing cognitive and physical e ort required. This paper presents a versatile, energy-efcient framework and algorithm for assistive indoor navigation using an electric wheelchair and user inputs from multiple modalities. The proposed algorithm automates indoor navigation using only a few user commands captured through a wearable device, with the goal of simplifying navigation tasks and making them more instinctive for the user. We evaluated the proposed methodology using both a virtual smart building and a prototype built with o -the-shelf IoT development boards. Our evaluations for three di erent oorplans show one order of magnitude reduction in user e ort and communication energy required for navigation when compared with conventional navigation methodologies that require continuous user inputs.
In this article, we propose a novel approach to detect the occupancy behavior of a building through the temperature and/or possible heat source information, which can be used for energy reduction and security monitoring for emerging smart buildings. Our work is based on a realistic building simulation program, EnergyPlus, which can model the various time-series inputs to a building such as ambient temperature, heating, ventilation, and air-conditioning (HVAC) inputs. Two machine learning based approaches for detecting human occupancy of a smart building are applied herein, namely: support vector regression (SVR) method and recurrent neural network (RNN) method. Experimental results with SVR method show that 4-feature model provides accurate detection rate giving a 0.638 average error and 0.0532 error ratio, and 5-feature model gives a 0.317 average error and 0.0264 error ratio. This indicates that SVR is a viable option for occupancy detection. In RNN method, Elman's RNN (ELNN) can estimate occupancy information of each room of a building with high accuracy. The error level, in terms of number of people can be as low as 0.0056 on average and 0.288 at maximum considering ambient, room temperatures and HVAC powers as detectable information. Our study further shows both methods can deliver similar accuracy in the occupancy detection. But that SVR model is more stable for changing features of the system, while the RNN method can deliver more accuracy when the features used in the model do not change a lot.
Recent studies in algorithmic microfluidics have led to the development of several techniques for automated solution preparation using droplet-based digital microfluidic (DMF) biochips. A major challenge in this direction is to produce a mixture of several reactants with a desired ratio while optimizing reactant-cost and preparation-time. The sequence of mix-split operations that are to be performed on the droplets is usually represented as a mixing-tree (or graph). In this paper, we present an efficient mixing algorithm namely Mixing Tree with Common Subtrees (MTCS) for preparing single-target mixtures. MTCS attempts to best-utilize intermediate droplets, which were otherwise wasted, and uses morphing based on permutation of leaf nodes to further reduce the graph-size. The technique can be generalized to produce multi-target ratios and we present another algorithm namely Multiple Target Ratios (MTR). Additionally, in order to enhance the output-load, we also propose an algorithm for droplet-streaming called Multi-Target Multi-Demand (MTMD). Simulation results on a large set of target-ratios show that MTCS can reduce the mean values of the total number of mix-split steps (Tms) and waste droplets (W) by 16% and 29% over Min-Mix (Thies et al. Natural Computing 2008), and by 22% and 34% over RMA (Roy et al. ACM TODAES 2015), respectively. Experimental results also suggest that MTR can reduce the average values of Tms and W by 23% and 44% over the repeated-version of Min-Mix, by 30% and 49% over the repeated-version of RMA, and by 9% and 22% over the repeated-version of MTCS, respectively. It is observed that MTMD can reduce the mean values of Tms and W by 64% and 85%, respectively, over MTR. Thus the proposed multi-target techniques MTR and MTMD provide efficient solutions to multiple-demand, multi-target mixture preparation on a DMF-platform.
Side channel attacks are a prominent threat to the security of embedded systems. To lead one, an adversary evaluates the goodness of fit of a key dependent model to the side channel measurements taken from an actual device, identifying the secret key value as the one yielding the best fitting model. We investigate the mapping between the models and the sources of information leakage in the CPU microarchitecture in a post map simulation environment and classify the leakage sources affecting different parts of the microarchitecture, Finally, we provide are able to provide hints to the software architect on potential vulnerabilities.
The weakest link in Cryptosystems is quite often due to the implementation rather than the mathematical underpinnings. A vast majority of attacks in the recent past have targeted programming flaws and bugs to break security systems. Due to the complexity, empirically verifying such systems is practically impossible, while manual verification as well as testing do not provide adequate guarantees. In this paper, we leverage model checking techniques to prove the functional correctness of an Elliptic curve cryptography (ECC) library with respect to its formal specification. We demonstrate how the huge state space of the C library can be aptly verified using a hierarchical assume-guarantee verification strategy. To test the scalability of this approach, we verify the correctness of four NIST-specified elliptic curve implementations. The smallest curve with a 192 bit prime field took 1 day to verify, while the largest curve with a 384 bit prime field took 8 days to verify.
Two novel layout generation methods of analog modules, a symmetrical twin-row-style for MOS transistors and a twisted common-centriod style for unit capacitors array, are introduced. Based on these limited layout styles and the reasonable algorithms, the symmetry and common-centriod placement patterns for analog devices are realized to guarantee the matching property. On this basis, as the most prominent contribution of this paper, the sensible channel routing based algorithms for two proposed layout styles could achieve the 100% routability due to the limited device placement and the corresponding lower routing complexity. The improved algorithms also bring benefits such as smaller layout area by maximizing the diffusion-sharing of MOS transistors, less routing layer usage for common-centroid devices array. Moreover, we apply our algorithms to layout designs with typical analog modules such as 2-stage operating amplifier and SAR-ADCs, generation results with the circuit simulations demonstrate the effectiveness of our algorithms in terms of the routability and matching property. These feasible and efficient algorithms can be also extended to apply varieties of essential MOS analog circuits simply.
Introduction to the Special Section on Advances in Physical Design Automation
Outsourcing of design and manufacturing processes makes integrated circuits (ICs) vulnerable to adversarial changes and raises concerns about their integrity. Reverse engineering the manufactured netlist helps identify malicious insertions. In this paper, we present an automated approach that, given a reference design description, infers high-level blocks in an untrusted gate-level (test) implementation. Using the structural connectivity of the netlists, we compute a geometric embedding for each wire in the circuits, which, then, is used to compute a bipartite matching between the nodes of the two designs and identify functional units in the test circuit. Experiments to evaluate the efficacy of the proposed technique on various-sized designs, including the multi-core processor OpenSparc T1, show that it can correctly match over 90% of gates in the test circuit to their corresponding block in the reference model.