<?xml version="1.0" ?>
<rss version="2.0">
	<channel>
		<title><![CDATA[ Very Large Scale Integration (VLSI) Systems, IEEE Transactions on - new TOC ]]></title>
		<link>http://ieeexplore.ieee.org</link>
		<description>TOC Alert for Publication# 92 </description>
		<year>2009</year>
		<month>November </month>
		<day>06</day>
		<item>
			<title><![CDATA[Table of contents]]></title>
			<link><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=5290268&arnumber=5290269]]></link>
			<description><![CDATA[ ]]></description>
			<pubDate><![CDATA[Nov.  2009]]></pubDate>
			<guid><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=5290268&arnumber=5290269]]></guid>
			<volume>17</volume>
			<issue>11</issue>
			<startPage>C1</startPage>
			<endPage>C1</endPage>
			<fileSize>43</fileSize>
			<authors><![CDATA[]]></authors>
		</item>
		<item>
			<title><![CDATA[IEEE Transactions on Very Large Scale Integration (VLSI) Systems publication information]]></title>
			<link><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=5290268&arnumber=5290314]]></link>
			<description><![CDATA[ ]]></description>
			<pubDate><![CDATA[Nov.  2009]]></pubDate>
			<guid><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=5290268&arnumber=5290314]]></guid>
			<volume>17</volume>
			<issue>11</issue>
			<startPage>C2</startPage>
			<endPage>C2</endPage>
			<fileSize>40</fileSize>
			<authors><![CDATA[]]></authors>
		</item>
		<item>
			<title><![CDATA[A Graph Drawing Based Spatial Mapping Algorithm for Coarse-Grained Reconfigurable Architectures]]></title>
			<link><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=5290268&arnumber=4801596]]></link>
			<description><![CDATA[<para> Recently coarse-grained reconfigurable architectures (CGRAs) have drawn increasing attention due to their efficiency and flexibility. While many CGRAs have demonstrated impressive performance improvements, the effectiveness of CGRA platforms ultimately hinges on the compiler. Existing CGRA compilers do not model the details of the CGRA, and thus they are i) unable to map applications, even though a mapping exists, and ii) using too many processing elements (PEs) to map an application. In this paper, we model several CGRA details, e.g., irregular CGRA topologies, shared resources and routing PEs in our compiler and develop a graph drawing based approach, Split-Push Kernel Mapping (SPKM), for mapping applications onto CGRAs. On randomly generated graphs our technique can map on average 4.5<formula formulatype="inline"><tex Notation="TeX">$times$</tex> </formula> more applications than the previous approach, while generating mappings which have better qualities in terms of utilized CGRA resources. Utilizing fewer resources is directly translated into increased opportunities for novel power and performance optimization techniques. Our technique shows less power consumption in 71 cases and shorter execution cycles in 66 cases out of 100 synthetic applications, with minimum mapping time overhead. We observe similar results on a suite of benchmarks collected from Livermore loops, Mediabench, Multimedia, Wavelet and DSPStone benchmarks. SPKM is not a customized algorithm only for a specific CGRA template, and it is demonstrated by exploring various PE interconnection topologies and shared resource configurations with SPKM. </para>]]></description>
			<pubDate><![CDATA[Nov.  2009]]></pubDate>
			<guid><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=5290268&arnumber=4801596]]></guid>
			<volume>17</volume>
			<issue>11</issue>
			<startPage>1565</startPage>
			<endPage>1578</endPage>
			<fileSize>2420</fileSize>
			<authors><![CDATA[Yoon, J. W.;Shrivastava, A.;Park, S.;Ahn, M.;Paek, Y.;]]></authors>
		</item>
		<item>
			<title><![CDATA[A DLL Design for Testing I/O Setup and Hold Times]]></title>
			<link><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=5290268&arnumber=4806135]]></link>
			<description><![CDATA[<para> A built-in self-test (BIST) circuit has been designed to test setup and hold times of I/O registers or buffers for memory interfaces. This method enables independent testing of setup and hold times without using an external tester, except to generate the reference clock. The circuit uses a delay-locked loop (DLL) to generate delayed clocks. It has been implemented with a 0.18-<formula formulatype="inline"><tex Notation="TeX">$ mu$</tex></formula>m TSMC process (CM018). The accuracy in delay generation is within 40 ps, for delay measurements ranging from 300 to 700 ps. In order to achieve high accuracy, the BIST circuit requires frequency adjustment during test, combined with averaging over multiple test cycles. To do this in an efficient manner, the DLL in the BIST circuit has been designed for a wide lock range, from 150 to 400 MHz, and achieves lock in less than 0.05 <formula formulatype="inline"><tex Notation="TeX">$ mu$</tex></formula>s. This paper describes the design in detail and evaluates its performance, together with test time and accuracy. It also shows how to use a low-resolution DLL to achieve high accuracy through frequency adjustment and averaging over multiple test cycles. </para>]]></description>
			<pubDate><![CDATA[Nov.  2009]]></pubDate>
			<guid><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=5290268&arnumber=4806135]]></guid>
			<volume>17</volume>
			<issue>11</issue>
			<startPage>1579</startPage>
			<endPage>1592</endPage>
			<fileSize>1109</fileSize>
			<authors><![CDATA[Jia, C.;Milor, L.;]]></authors>
		</item>
		<item>
			<title><![CDATA[Implementing Multiphase Resonant Clocking on a Finite-Impulse Response Filter]]></title>
			<link><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=5290268&arnumber=4895681]]></link>
			<description><![CDATA[<para> Rotary clock is a resonant clocking technique that delivers on-chip clock signal distribution with very low power dissipation. Since it can only generate clock signals with multiple phases that are spatially distributed, rotary clock is often considered not applicable to industrial very large scale integration (VLSI) designs. This paper presents the first rotary-clock-based nontrivial digital circuit. Our design, a low-power and high-speed finite-impulse response (FIR) filter, is fully digital and generated using CMOS standard cells in 0.18 <formula formulatype="inline"><tex Notation="TeX">$ mu {hbox {m}}$</tex></formula> technology. We have shown that the proposed FIR filter is seamlessly integrated with the rotary clock technique. It uses the spatially distributed multiple clock phases of rotary clock and achieves high power savings. Simulation results demonstrate that our rotary-clock-based FIR filter can operate successfully at 610 MHz, providing a throughput of 39 Gb/s. In comparison with the conventional clock-tree-based design, our design achieves a 34.6% clocking power saving and a 12.8% overall circuit power saving. In addition, the peak current consumed by the rotary-clock-based filter is substantially lower by 40% on the average. Our study makes the crucial step toward the application of rotary clock technique to a broad range of VLSI designs. </para>]]></description>
			<pubDate><![CDATA[Nov.  2009]]></pubDate>
			<guid><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=5290268&arnumber=4895681]]></guid>
			<volume>17</volume>
			<issue>11</issue>
			<startPage>1593</startPage>
			<endPage>1601</endPage>
			<fileSize>1185</fileSize>
			<authors><![CDATA[Yu, Z.;Liu, X.;]]></authors>
		</item>
		<item>
			<title><![CDATA[Backward Interpolation Architecture for Algebraic Soft-Decision Reed&#x2013;Solomon Decoding]]></title>
			<link><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=5290268&arnumber=4837873]]></link>
			<description><![CDATA[<para> Recently developed algebraic soft-decision (ASD) decoding of Reed&#x2013;Solomon (RS) codes have attracted much interest due to the fact that they can achieve significant coding gain with polynomial complexity. One major step of ASD decoding is the interpolation. Available interpolation algorithms can only add interpolation points or increase interpolation multiplicities. However, backward interpolation, which eliminates interpolation points or reduces interpolation multiplicities, is indispensable to enable the reusing of interpolation results in the following two scenarios: 1) interpolation needs to be carried out on multiple test vectors, which share common entries and 2) iterative ASD decoding where interpolation points have decreasing multiplicities. Examples for these cases are the low-complexity Chase (LCC) decoding and bit-level generalized minimum distance (BGMD) decoding. With lower complexity, these algorithms can achieve similar or higher coding gain than other practical ASD algorithms. In this paper, we propose novel backward interpolation schemes and corresponding efficient implementation architectures for LCC and BGMD decoding through constructing equivalent Gr&#x00D6;bner bases. The proposed architectures share computational units with forward interpolation architectures. Hence, the area overhead for incorporating the backward interpolation is very small. Substantial area saving or speedup can be achieved by using the backward interpolation. When the proposed architecture is applied to the LCC decoding of a (255, 239) RS code with <formula formulatype="inline"><tex Notation="TeX">$ eta=3$</tex></formula>, the area is reduced to 39% of those required by prior architectures. In terms of speed/area ratio, the proposed architecture is 48% more efficient than the best available architecture. For the BGMD decoding of the same code, the proposed architecture can achieve around 20% higher efficiency. </para>]]></description>
			<pubDate><![CDATA[Nov.  2009]]></pubDate>
			<guid><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=5290268&arnumber=4837873]]></guid>
			<volume>17</volume>
			<issue>11</issue>
			<startPage>1602</startPage>
			<endPage>1615</endPage>
			<fileSize>590</fileSize>
			<authors><![CDATA[Zhu, J.;Zhang, X.;Wang, Z.;]]></authors>
		</item>
		<item>
			<title><![CDATA[Adaptive Frequency-Domain Channel Estimator in 4<formula formulatype="inline"><tex Notation="TeX">$ ,times ,$</tex></formula>4 MIMO-OFDM Modems]]></title>
			<link><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=5290268&arnumber=4811933]]></link>
			<description><![CDATA[<para> This work presents an adaptive frequency-domain channel estimator (FD-CE) for equalization of space&#x2013;time block code multiple-input multiple-output (MIMO) orthogonal frequency-division multiplexing (OFDM) systems in time-varying frequency-selective fading. The proposed adaptive FD&#x2013;CE ensures the channel estimation accuracy in each set of four MIMO-OFDM symbols. Performance evaluation shows that the proposed method achieved a 10% packet error rate of 64 quadrature amplitude modulation (QAM) at 29.5 dB SNR under 120 km/h (Doppler shift is 266 Hz) in 4<formula formulatype="inline"><tex Notation="TeX">$ ,times,$</tex></formula>4 MIMO-OFDM systems. To decrease complexity, the rich feature of Alamouti-like matrix is exploited to derive an efficient very large-scale integration (VLSI) solution. Finally, this adaptive FD-CE using an in-house 0.13-<formula formulatype="inline"><tex Notation="TeX">$ mu$</tex> </formula>m CMOS library occupies an area of 3<formula formulatype="inline"> <tex Notation="TeX">$ ,times,$</tex></formula>3.1 mm<formula formulatype="inline"> <tex Notation="TeX">$^{2}$</tex></formula>, and the 4<formula formulatype="inline"> <tex Notation="TeX">$ , times ,$</tex></formula>4 MIMO-OFDM modem consumes about 62.8 mW at 1.2 V supply voltage. </para>]]></description>
			<pubDate><![CDATA[Nov.  2009]]></pubDate>
			<guid><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=5290268&arnumber=4811933]]></guid>
			<volume>17</volume>
			<issue>11</issue>
			<startPage>1616</startPage>
			<endPage>1625</endPage>
			<fileSize>1639</fileSize>
			<authors><![CDATA[Sun, M.-F.;Juan, T.-Y.;Lin, K.-S.;Hsu, T.-Y.;]]></authors>
		</item>
		<item>
			<title><![CDATA[Crosstalk-Aware Channel Coding Schemes for Energy Efficient and Reliable NOC Interconnects]]></title>
			<link><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=5290268&arnumber=4801555]]></link>
			<description><![CDATA[<para> Network-on-chip (NOC) is emerging as a revolutionary methodology to integrate numerous intellectual property blocks in a single die. It is the packet switching-based communications backbone that interconnects the components on multicore system-on-chip (SoC). A major challenge that NOC design is expected to face is related to the intrinsic unreliability of the interconnect infrastructure under technology limitations. By incorporating error control coding schemes along the interconnects, NOC architectures are able to provide correct functionality in the presence of different sources of transient noise and yet have lower overall energy dissipation. In this paper, designs of novel joint crosstalk avoidance and triple-error-correction/quadruple-error-detection codes are proposed, and their performance is evaluated in different NOC fabrics. It is demonstrated that the proposed codes outperform other existing coding schemes in making NOC fabrics reliable and energy efficient, with lower latency. </para>]]></description>
			<pubDate><![CDATA[Nov.  2009]]></pubDate>
			<guid><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=5290268&arnumber=4801555]]></guid>
			<volume>17</volume>
			<issue>11</issue>
			<startPage>1626</startPage>
			<endPage>1639</endPage>
			<fileSize>855</fileSize>
			<authors><![CDATA[Ganguly, A.;Pande, P. P.;Belzer, B.;]]></authors>
		</item>
		<item>
			<title><![CDATA[A Framework for Power-Gating Functional Units in Embedded Microprocessors]]></title>
			<link><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=5290268&arnumber=4806137]]></link>
			<description><![CDATA[<para> Power gating is a technique commonly used for leakage reduction in integrated circuits. In microprocessors, power gating is implemented by using sleep transistors to selectively deactivate circuit modules that remain idle for sustained periods of time during program execution. In this work, we develop a new framework for power gating the functional units in embedded system microprocessors without degradation in performance. The proposed framework includes an efficient algorithm for idle time estimation, appropriate insertion of sleep instructions within the code, and a method for reactivating the sleeping units only when needed <emphasis emphasistype="boldital">without the use of wakeup instructions</emphasis>. We introduce the notion of <emphasis emphasistype="boldital">loop hierarchy trees</emphasis> (LHTs) to represent the partial ordering of the nested loops within the program. From the control flow graph (CFG) representation of the source program, a forest of LHTs is constructed and is used to identify the maximal subgraphs representing the long idle periods for the functional units. For each subgraph thus identified, a sleep instruction is introduced in the program with a list of corresponding functional units to be deactivated. When an instruction is decoded, the functional units needed for that instruction are automatically activated by the control unit such that the units are ready before the instruction reaches the execute stage. This eliminates the need for wakeup instructions to be inserted into the object code reducing the overheads. In our implementation, the ARM processor architecture was modified and resynthesized to include power gating by developing a CMOS cell library of functional units with the above capabilities. Experimental results are reported for a set of 12 benchmarks chosen from the MiBench suite, which indicate that, on average, our technique reduces the leakage energy in functional units by 31.1% for integer benchmarks and 26.8% -
for floating-point benchmarks. </para>]]></description>
			<pubDate><![CDATA[Nov.  2009]]></pubDate>
			<guid><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=5290268&arnumber=4806137]]></guid>
			<volume>17</volume>
			<issue>11</issue>
			<startPage>1640</startPage>
			<endPage>1649</endPage>
			<fileSize>654</fileSize>
			<authors><![CDATA[Roy, S.;Ranganathan, N.;Katkoori, S.;]]></authors>
		</item>
		<item>
			<title><![CDATA[Internal and External Bitstream Relocation for Partial Dynamic Reconfiguration]]></title>
			<link><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=5290268&arnumber=4801552]]></link>
			<description><![CDATA[<para> The research described in this paper shows how the runtime relocation of a reconfigurable component can be obtained using a system component that is able to update the bitstream information, moving the reconfigurable module in the desired position. This scenario defines the so-called partial bitstream relocation activity. This paper proposes a relocation filter that can be implemented both as a hardware and a software component. The former is hosted in the static part of the reconfigurable architecture, while the latter is made to be run on the processor placed on the field-programmable gate array (FPGA). The proposed approach has also been validated over different FPGAs, i.e., Virtex II Pro, Virtex 4, and Virtex 5, proposing a runtime relocation support that can be customized to meet all the different constraints associated with these different target architectures. </para>]]></description>
			<pubDate><![CDATA[Nov.  2009]]></pubDate>
			<guid><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=5290268&arnumber=4801552]]></guid>
			<volume>17</volume>
			<issue>11</issue>
			<startPage>1650</startPage>
			<endPage>1654</endPage>
			<fileSize>199</fileSize>
			<authors><![CDATA[Corbetta, S.;Morandi, M.;Novati, M.;Santambrogio, M. D.;Sciuto, D.;Spoletini, P.;]]></authors>
		</item>
		<item>
			<title><![CDATA[Effective Diagnostic Pattern Generation Strategy for Transition-Delay Faults in Full-Scan SOCs]]></title>
			<link><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=5290268&arnumber=4801556]]></link>
			<description><![CDATA[<para> Nanometric circuits and systems are increasingly susceptible to delay defects. This paper describes a strategy for the diagnosis of transition-delay faults in full-scan systems-on-a-chip (SOCs). The proposed methodology takes advantage of a suitably generated software-based self-test test set and of the scan-chains included in the final SOC design. Effectiveness and feasibility of the proposed approach were evaluated on a nanometric SOC test vehicle including an 8-bit microcontroller, some memory blocks and an arithmetic core, manufactured by STMicroelectronics. Results show that the proposed technique can achieve high diagnostic resolution while maintaining a reasonable application time. </para>]]></description>
			<pubDate><![CDATA[Nov.  2009]]></pubDate>
			<guid><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=5290268&arnumber=4801556]]></guid>
			<volume>17</volume>
			<issue>11</issue>
			<startPage>1654</startPage>
			<endPage>1659</endPage>
			<fileSize>432</fileSize>
			<authors><![CDATA[Appello, D.;Bernardi, P.;Grosso, M.;Sanchez, E.;Sonza Reorda, M.;]]></authors>
		</item>
		<item>
			<title><![CDATA[Energy-Efficient Dual-Edge-Triggered Level Converting Flip Flops With Symmetry in Setup Times and Insensitivity to Output Parasitics]]></title>
			<link><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=5290268&arnumber=4814496]]></link>
			<description><![CDATA[<para> Level converting flip-flops (LCFFs) are crucial components for multisupply systems as interfaces between different voltage islands. The proposed energy-efficient LCFFs reduce the power consumption of clock networks with dual-edge triggering, support sleep mode of power management mechanisms with data retention, and have symmetry in setup times and insensitivity to output parasitics. With all these features, the proposed LCFFs have 19% and 38% lower power-delay product than the conventional LCFF, as demonstrated by postlayout simulation results. </para>]]></description>
			<pubDate><![CDATA[Nov.  2009]]></pubDate>
			<guid><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=5290268&arnumber=4814496]]></guid>
			<volume>17</volume>
			<issue>11</issue>
			<startPage>1659</startPage>
			<endPage>1663</endPage>
			<fileSize>506</fileSize>
			<authors><![CDATA[Chiou, L.-Y.;Luo, S.-C.;]]></authors>
		</item>
		<item>
			<title><![CDATA[ISCAS 2010 nono-bio circuit fabrics and systems]]></title>
			<link><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=5290268&arnumber=5290316]]></link>
			<description><![CDATA[ ]]></description>
			<pubDate><![CDATA[Nov.  2009]]></pubDate>
			<guid><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=5290268&arnumber=5290316]]></guid>
			<volume>17</volume>
			<issue>11</issue>
			<startPage>1664</startPage>
			<endPage>1664</endPage>
			<fileSize>642</fileSize>
			<authors><![CDATA[]]></authors>
		</item>
		<item>
			<title><![CDATA[IEEE Transactions on Very Large Scale Integration (VLSI) Systems society information]]></title>
			<link><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=5290268&arnumber=5290315]]></link>
			<description><![CDATA[ ]]></description>
			<pubDate><![CDATA[Nov.  2009]]></pubDate>
			<guid><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=5290268&arnumber=5290315]]></guid>
			<volume>17</volume>
			<issue>11</issue>
			<startPage>C3</startPage>
			<endPage>C3</endPage>
			<fileSize>27</fileSize>
			<authors><![CDATA[]]></authors>
		</item>
		<item>
			<title><![CDATA[IEEE Transactions on Very Large Scale Integration (VLSI) Systems information for authors]]></title>
			<link><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=5290268&arnumber=5290313]]></link>
			<description><![CDATA[ ]]></description>
			<pubDate><![CDATA[Nov.  2009]]></pubDate>
			<guid><![CDATA[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=5290268&arnumber=5290313]]></guid>
			<volume>17</volume>
			<issue>11</issue>
			<startPage>C4</startPage>
			<endPage>C4</endPage>
			<fileSize>28</fileSize>
			<authors><![CDATA[]]></authors>
		</item>
	</channel>
</rss>