<?xml version="1.0" ?>
<rss version="2.0">
	<channel>
		<title><![CDATA[ Very Large Scale Integration (VLSI) Systems, IEEE Transactions on - new TOC ]]></title>
		<link>http://ieeexplore.ieee.org</link>
		<description>TOC Alert for Publication# 92 </description>
		<year>2009</year>
		<month>November </month>
		<day>19</day>
		<item>
			<title><![CDATA[Table of contents]]></title>
			<link><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=5313696&arnumber=5337738]]></link>
			<description><![CDATA[ ]]></description>
			<pubDate><![CDATA[Dec.  2009]]></pubDate>
			<guid><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=5313696&arnumber=5337738]]></guid>
			<volume>17</volume>
			<issue>12</issue>
			<startPage>C1</startPage>
			<endPage>C1</endPage>
			<fileSize>42</fileSize>
			<authors><![CDATA[]]></authors>
		</item>
		<item>
			<title><![CDATA[IEEE Transactions on Very Large Scale Integration (VLSI) Systems publication information]]></title>
			<link><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=5313696&arnumber=5337739]]></link>
			<description><![CDATA[ ]]></description>
			<pubDate><![CDATA[Dec.  2009]]></pubDate>
			<guid><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=5313696&arnumber=5337739]]></guid>
			<volume>17</volume>
			<issue>12</issue>
			<startPage>C2</startPage>
			<endPage>C2</endPage>
			<fileSize>40</fileSize>
			<authors><![CDATA[]]></authors>
		</item>
		<item>
			<title><![CDATA[A Fast Built-in Redundancy Analysis for Memories With Optimal Repair Rate Using a Line-Based Search Tree]]></title>
			<link><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=5313696&arnumber=4801554]]></link>
			<description><![CDATA[<para> With the growth of memory capacity and density, test cost and yield improvement are becoming more important. In the case of embedded memories for systems-on-a-chip (SOC), built-in redundancy analysis (BIRA) is widely used as a solution to solve quality and yield issues by replacing faulty cells with extra good cells. However, previous BIRA approaches focused mainly on embedded memories rather than commodity memories. Many BIRA approaches require extra hardware overhead to achieve the optimal repair rate, which means that 100% of solution detection is guaranteed for intrinsically repairable dies, or they suffer a loss of repair rate to minimize the hardware overhead. In order to achieve both low area overhead and optimal repair rate, a novel BIRA approach is proposed and it builds a line-based searching tree. The proposed BIRA minimizes the storage capacity requirements to store faulty address information by dropping all unnecessary faulty addresses for inherently repairable die. The proposed BIRA analyzes redundancies quickly and efficiently with optimal repair rate by using a selected fail count comparison algorithm. Experimental results show that the proposed BIRA achieves optimal repair rate, fast analysis speed, and nearly optimal repair solutions with a relatively small area overhead. </para>]]></description>
			<pubDate><![CDATA[Dec.  2009]]></pubDate>
			<guid><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=5313696&arnumber=4801554]]></guid>
			<volume>17</volume>
			<issue>12</issue>
			<startPage>1665</startPage>
			<endPage>1678</endPage>
			<fileSize>1703</fileSize>
			<authors><![CDATA[Jeong, W.;Kang, I.;Jin, K.;Kang, S.;]]></authors>
		</item>
		<item>
			<title><![CDATA[Impact of Die-to-Die and Within-Die Parameter Variations on the Clock Frequency and Throughput of Multi-Core Processors]]></title>
			<link><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=5313696&arnumber=4956977]]></link>
			<description><![CDATA[<para> <?Pub Dtl?>A statistical performance simulator is developed to explore the impact of parameter variations on the maximum clock frequency (FMAX) and throughput distributions of multi-core processors in a future 22 nm technology. The simulator captures the effects of die-to-die (D2D) and within-die (WID) transistor and interconnect parameter variations on critical path delays in a die. A key component of the simulator is an analytical multi-core processor throughput model, which enables computationally efficient and accurate throughput calculations, as compared with cycle-accurate performance simulators, for single-threaded and highly parallel multi-threaded (MT) workloads. Based on microarchitecture designs from previous microprocessors, three multi-core processors with either small, medium, or large cores are projected for the 22 nm technology generation to investigate a range of design options. These three multi-core processors are optimized for maximum throughput within a constant die area. A traditional single-core processor is also scaled to the 22 nm technology to provide a baseline comparison. <emphasis emphasistype="boldital">The salient contributions from this paper are: 1) product-level variation analysis for multi-core processors must focus on throughput, rather than just FMAX, and 2) multi-core processors are more variation tolerant than single-core processors due to the larger impact of memory latency and bandwidth on throughput.</emphasis> To elucidate these two points, statistical simulations indicate that multi-core and single-core processors with an equivalent total core area have similar FMAX distributions (mean degradation of 9% and standard deviation of 5%) for MT applications. In contrast to single-core processors, memory latency and bandwidth constraints significantly limit the throughput dependency on FMAX in multi-core processors, thus reducing the throughput mean degradation and standard deviation by <formula formulatype="inline"><tex -
Notation="TeX">$ sim $</tex></formula>50% for the small and medium core designs and by <formula formulatype="inline"><tex Notation="TeX">$ sim $</tex></formula>30% for the large core design. This improvement in the throughput distribution indicates that multi-core processors could significantly reduce the product design and process development complexities due to parameter variations as compared to single-core processors, enabling faster time to market for high-performance microprocessor products. </para>]]></description>
			<pubDate><![CDATA[Dec.  2009]]></pubDate>
			<guid><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=5313696&arnumber=4956977]]></guid>
			<volume>17</volume>
			<issue>12</issue>
			<startPage>1679</startPage>
			<endPage>1690</endPage>
			<fileSize>713</fileSize>
			<authors><![CDATA[Bowman, K. A.;Alameldeen, A. R.;Srinivasan, S. T.;Wilkerson, C. B.;]]></authors>
		</item>
		<item>
			<title><![CDATA[Hardware Acceleration for Media/Transaction Applications in Network Processors]]></title>
			<link><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=5313696&arnumber=4803735]]></link>
			<description><![CDATA[<para> As the network environment is rapidly changing, network interfaces demand highly intelligent traffic management (on control plane) in addition to the basic requirement of wire speed packet forwarding (on data plane). Several vendors are releasing various network processors (NPS) in order to handle these demands, but they are optimized for throughputs mostly in data plane. As demands for control plane applications (e.g., quality of service) grow, efficient control plane processing will become increasingly important to good performance of network interface. In this paper, we explore acceleration techniques to improve the performance of control plane network applications. Three applications including media transcoding and transaction applications are analyzed in detail. The result of workload characterization shows that wide-issue configuration shows early saturation in performance, and there is no common bottleneck among applications based on sensitivity analysis. Therefore, we study to get each application have its own hardware acceleration module in order to accomplish the required throughput on OC-768 or higher. Our approach includes array style accelerator for media transcoding applications and partitioned lookup mechanism for lookup-table-related applications. Performance analysis of the proposed techniques shows significant improvement over the baseline configuration. Such hardware accelerators provide large packet-level parallelism proportional to the number of processing elements added. Our analyses of the proposed techniques suggest future directions for the design of high-performance NPs. </para>]]></description>
			<pubDate><![CDATA[Dec.  2009]]></pubDate>
			<guid><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=5313696&arnumber=4803735]]></guid>
			<volume>17</volume>
			<issue>12</issue>
			<startPage>1691</startPage>
			<endPage>1697</endPage>
			<fileSize>664</fileSize>
			<authors><![CDATA[Lee, B. K.;John, L. K.;]]></authors>
		</item>
		<item>
			<title><![CDATA[A 2.5-GHz Built-in Jitter Measurement System in a Serial-Link Transceiver]]></title>
			<link><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=5313696&arnumber=5152959]]></link>
			<description><![CDATA[<para> A 2.5-GHz built-in jitter measurement (BIJM) system is adopted to measure the clock jitter on a transmitter and receiver. The proposed Vernier caliper and autofocus approaches reduce the area cost of delay cells by 48.78% relative to pure Vernier delay line structure with a wide measurement range. The counter circuit occupies an area of 19 <formula formulatype="inline"><tex Notation="TeX">$mu$</tex> </formula>m <formula formulatype="inline"><tex Notation="TeX">$times$</tex> </formula> 61 <formula formulatype="inline"><tex Notation="TeX">$mu$</tex> </formula>m in the traditional stepping scan approach. The proposed equivalent-signal sampling technique removes the input jitter transfer path from the sampling clock. The power supply rejection design is incorporated into the delay cell and the judge circuit. The layout implementation, calibration, and test time of the proposed BIJM system are all improved. The core circuit occupies an area of only 0.5 mm <formula formulatype="inline"><tex Notation="TeX">$times$</tex> </formula> 0.15 mm with the 90-nm CMOS process. The Gaussian and uniform distributions jitter is verified at a 5-ps timing resolution and a 2.5-GHz input clock frequency . </para>]]></description>
			<pubDate><![CDATA[Dec.  2009]]></pubDate>
			<guid><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=5313696&arnumber=5152959]]></guid>
			<volume>17</volume>
			<issue>12</issue>
			<startPage>1698</startPage>
			<endPage>1708</endPage>
			<fileSize>1455</fileSize>
			<authors><![CDATA[Jiang, S-.Y.;Cheng, K-.H.;Jian, P-.Y.;]]></authors>
		</item>
		<item>
			<title><![CDATA[Floating-Point FPGA: Architecture and Modeling]]></title>
			<link><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=5313696&arnumber=4814468]]></link>
			<description><![CDATA[<para> This paper presents an architecture for a reconfigurable device that is specifically optimized for floating-point applications. Fine-grained units are used for implementing control logic and bit-oriented operations, while parameterized and reconfigurable word-based coarse-grained units incorporating word-oriented lookup tables and floating-point operations are used to implement datapaths. In order to facilitate comparison with existing FPGA devices, the virtual embedded block scheme is proposed to model embedded blocks using existing field-programmable gate array (FPGA) tools. This methodology involves adopting existing FPGA resources to model the size, position, and delay of the embedded elements. The standard design flow offered by FPGA and computer-aided design vendors is then applied and static timing analysis can be used to estimate the performance of the FPGA with the embedded blocks. On selected floating-point benchmark circuits, our results indicate that the proposed architecture can achieve four times improvement in speed and 25 times reduction in area compared with a traditional FPGA device. </para>]]></description>
			<pubDate><![CDATA[Dec.  2009]]></pubDate>
			<guid><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=5313696&arnumber=4814468]]></guid>
			<volume>17</volume>
			<issue>12</issue>
			<startPage>1709</startPage>
			<endPage>1718</endPage>
			<fileSize>1305</fileSize>
			<authors><![CDATA[Ho, C. H.;Yu, C. W.;Leong, P.;Luk, W.;Wilton, S. J. E.;]]></authors>
		</item>
		<item>
			<title><![CDATA[Improved Pervasive Sensing With RFID: An Ultra-Low Power Baseband Processor for UHF Tags]]></title>
			<link><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=5313696&arnumber=4804674]]></link>
			<description><![CDATA[<para> Recently, <emphasis emphasistype="boldital">radio frequency identification</emphasis> (RFID) systems have gained popularity in manufacturing units, inventory, and logistics, as they represent an inexpensive and reliable solution for automatic identification. Moreover, RFID transponders are expected to become a key element in the ubiquitous computing scenario. Tags will likely be used to collect sensors data, enabling noninvasive environment monitoring. Low-cost passive UHF transponders are expected to play a major role in this context, due to extended read range capabilities. Within a passive tag, power harvested from the field irradiated by the reader during the communication should operate both digital control circuitry and potential sensing devices. Exploiting ultra-low power tag circuitry would provide sensing sections with higher energy, thus improving measurement performance. In this paper, the design of a novel circuit is presented, which implements the baseband processor of a UHF-RFID tag in compliance with the ISO 18000-6B protocol. Regardless of protocol selection issues, several power saving strategies are devised, both at the system and circuit levels, suitable for passive transponder implementation. Near-threshold operation has been exploited to attain ultra-low power consumption while keeping fair performance. A set of standard cells has been designed, suitable for the power-limited specific application. The proposed solution has been successfully checked by means of a physical implementation on CMOS 0.18 <formula formulatype="inline"><tex Notation="TeX">$ mu{hbox {m}}$</tex></formula> technology. Test chips have been characterized in terms of voltage and frequency operating range and power consumption figure has been extensively analyzed. Measurement results fully support the selected design approach: the baseband processor dissipates only 440 nW average power when operated at 800 kHz and 0.6 V. This extremely-low power consumption enables hi-
gh-performance ubiquitous computing. </para>]]></description>
			<pubDate><![CDATA[Dec.  2009]]></pubDate>
			<guid><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=5313696&arnumber=4804674]]></guid>
			<volume>17</volume>
			<issue>12</issue>
			<startPage>1719</startPage>
			<endPage>1729</endPage>
			<fileSize>2294</fileSize>
			<authors><![CDATA[Ricci, A.;Grisanti, M.;De Munari, I.;Ciampolini, P.;]]></authors>
		</item>
		<item>
			<title><![CDATA[Power Management Using Test-Pattern Ordering for Wafer-Level Test During Burn-In]]></title>
			<link><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=5313696&arnumber=4804673]]></link>
			<description><![CDATA[<para> Wafer-level test during burn-in (WLTBI) is a promising technique to reduce test and burn-in costs in semiconductor manufacturing. However, scan-based testing leads to significant power variations in a die during test-pattern application. This variation adversely affects the accuracy of predictions of junction temperatures and the time required for burn-in. We present a test-pattern ordering technique for WLTBI, where the objective is to minimize the variation in power consumption during test application. The test-pattern ordering problem for WLTBI is formulated and solved optimally using integer linear programming. Efficient heuristic methods are also presented to easily solve the pattern-ordering problem for large circuits. Simulation results are presented for the ISCAS'89 and the IWLS'05 benchmark circuits, and the proposed ordering technique is compared with two baseline methods that carry out pattern ordering to minimize peak power and average power, respectively. A third baseline method that randomly orders test patterns is also used to evaluate the proposed methods. </para>]]></description>
			<pubDate><![CDATA[Dec.  2009]]></pubDate>
			<guid><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=5313696&arnumber=4804673]]></guid>
			<volume>17</volume>
			<issue>12</issue>
			<startPage>1730</startPage>
			<endPage>1741</endPage>
			<fileSize>763</fileSize>
			<authors><![CDATA[Bahukudumbi, S.;Chakrabarty, K.;]]></authors>
		</item>
		<item>
			<title><![CDATA[A Low-Power, Fast Acquisition, Data Recovery Circuit With Digital Threshold Decision for SFI-5 Application]]></title>
			<link><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=5313696&arnumber=5238568]]></link>
			<description><![CDATA[<para> An all-digital clock and data recovery (CDR) with a digital threshold decision updating technique for SFI-5 application is presented in this paper. The CDR updates its decision upon the phase error reaching a threshold value by examining the phase errors in the data bits within an examining window at the baud rate. High jitter tolerance performance is obtained and the phase acquisition can be achieved within one baud period. The proposed CDR is embodied with 900 transistors and the core CDR consumes 5 mW with 1.2 V supply at 2.5 Gb/s. Measured results verify the digital threshold decision technique and its low-complexity implementation for SFI-5 application. </para>]]></description>
			<pubDate><![CDATA[Dec.  2009]]></pubDate>
			<guid><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=5313696&arnumber=5238568]]></guid>
			<volume>17</volume>
			<issue>12</issue>
			<startPage>1742</startPage>
			<endPage>1748</endPage>
			<fileSize>1385</fileSize>
			<authors><![CDATA[Du, Q.;Zhuang, J.;Kwasniewski, T.;]]></authors>
		</item>
		<item>
			<title><![CDATA[Gated Decap: Gate Leakage Control of On-Chip Decoupling Capacitors in Scaled Technologies]]></title>
			<link><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=5313696&arnumber=4806134]]></link>
			<description><![CDATA[<para> To minimize the leakage power dissipation of present-day on-chip Decaps, we propose a gated decoupling capacitor (GDecap) technique that deactivates a Decap when it is not needed. The application of the proposed GDecap technique on an eight-way clock-gated clustered pipeline showed that on average, 41.7% Decap leakage power was reduced, with negligible <formula formulatype="inline"> <tex Notation="TeX">$ ({sim 0.037}%)$</tex></formula> worst-case performance degradation, at the 70-nm technology node. GDecap design incurred an area overhead of around 5.36% when compared with a conventional Decap design. </para>]]></description>
			<pubDate><![CDATA[Dec.  2009]]></pubDate>
			<guid><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=5313696&arnumber=4806134]]></guid>
			<volume>17</volume>
			<issue>12</issue>
			<startPage>1749</startPage>
			<endPage>1752</endPage>
			<fileSize>630</fileSize>
			<authors><![CDATA[Chen, Y.;Li, H.;Roy, K.;Koh, C.-K.;]]></authors>
		</item>
		<item>
			<title><![CDATA[2009 Index IEEE Transactions on Very Large Scale Integration (VLSI) Systems Vol. 17]]></title>
			<link><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=5313696&arnumber=5337728]]></link>
			<description><![CDATA[ ]]></description>
			<pubDate><![CDATA[Dec.  2009]]></pubDate>
			<guid><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=5313696&arnumber=5337728]]></guid>
			<volume>17</volume>
			<issue>12</issue>
			<startPage>1753</startPage>
			<endPage>1776</endPage>
			<fileSize>237</fileSize>
			<authors><![CDATA[]]></authors>
		</item>
		<item>
			<title><![CDATA[IEEE Transactions on Very Large Scale Integration (VLSI) Systems society information]]></title>
			<link><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=5313696&arnumber=5337730]]></link>
			<description><![CDATA[ ]]></description>
			<pubDate><![CDATA[Dec.  2009]]></pubDate>
			<guid><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=5313696&arnumber=5337730]]></guid>
			<volume>17</volume>
			<issue>12</issue>
			<startPage>C3</startPage>
			<endPage>C3</endPage>
			<fileSize>27</fileSize>
			<authors><![CDATA[]]></authors>
		</item>
		<item>
			<title><![CDATA[IEEE Transactions on Very Large Scale Integration (VLSI) Systems information for authors]]></title>
			<link><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=5313696&arnumber=5337731]]></link>
			<description><![CDATA[ ]]></description>
			<pubDate><![CDATA[Dec.  2009]]></pubDate>
			<guid><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=5313696&arnumber=5337731]]></guid>
			<volume>17</volume>
			<issue>12</issue>
			<startPage>C4</startPage>
			<endPage>C4</endPage>
			<fileSize>28</fileSize>
			<authors><![CDATA[]]></authors>
		</item>
	</channel>
</rss>