By Topic

IEEE Quick Preview
  • Abstract
800MB/s DDR NAND Flash Memory Multi-Chip Package Internal Architecture.



CONVENTIONAL NAND flash memory employs a parallel multi-drop bus architecture to connect multiple devices to a controller. As bus speeds have increased with the emergence of high speed Open NAND Flash Interface (ONFI) [1] and toggle-mode [2] NAND flash devices, the limitations of the multi-drop bus become apparent. As speed increases the number of parallel loads that can be supported is reduced. On-die termination is used to mitigate the loading effects at the cost of increased power consumption. To support high capacity and high performance solid-state drives (SSDs) a large number of memory channels, each supporting only 4–8 NAND devices, will be required.

A ring topology was originally proposed for DRAM main memory in the IEEE Ramlink standard [3]. Higher speed operation is possible because each device drives only a single load, rather than multiple loads as in a parallel bus topology. A drawback of the ring topology in DRAM applications is that each stage in the ring adds additional latency which is critical for processor main memory performance.

HyperLink NAND or HLNAND [4] was introduced to overcome the performance and scalability issues of conventional NAND flash parallel bus architectures. Since internal NAND read and program operations take on the order of 100 Formula$\mu{\rm s}$ and 1 ms respectively, the additional nano-second latency of the ring topology is negligible. HLNAND also introduced a fully packetized command and address format with DDR data transfers. Multiple chip enable signals (CE#) were eliminated by a device ID byte in the command packet. While each device in the HyperLink ring requires a separate 8-bit input and 8-bit output as opposed to a single 8-bit bidirectional input/output bus in the case of conventional NAND, the number of controller pins required per channel for high capacity configurations remains about the same due to the reduction in CE# pins.

An HLNAND Multi-Chip Package (MCP) employing a stack of 8 conventional commercially available parallel bus NAND devices and a custom bridge chip interfacing to the external ring was developed [5]. The bridge chip allows slower parallel-bus NAND devices to communicate with the 300 MB/s high speed ring, providing concurrent operations over 4 internal interfaces within a single package. This MCP approach isolates individual memory die from the ring so that they do not contribute to power dissipation. Although the data circulates around the ring through point-to-point connections the 150 MHz DDR clock is delivered in a parallel multi-drop fashion. Only 4–8 loads can be driven by a single clock and larger configurations would require multiple parallel clocks. Also, DDR operation beyond 300 MB/s becomes challenging when clock and data do not have identical signal paths and load. To achieve higher speed operation with a single clock we introduce in this paper a source synchronous clocking scheme for the HyperLink ring architecture. A new 90 nm bridge chip was developed and fabricated for this work.



Fig. 1 shows an HLNAND source synchronous ring topology configuration with 8 flash MCPs. Clock and data originate from the same source and terminate at the same destination with matched drivers and receivers so that their phase relationship is maintained over the full range of operating conditions and system configurations. Signals originate from the controller and are regenerated in each MCP as they circulate around the ring back to the controller. A single-ended 8 bit bus for control, address and data information D[7:0]—Q[7:0] is synchronized with a differential source synchronous clock CKI/CKI#—CKO/CKO#. The differential clock allows better control of clock duty cycle. An active CSI—CSO strobe signal indicates the presence of a command packet on the D[7:0]—Q[7:0] bus. An active DSI—DSO strobe signal indicates the presence of a data packet on the D[7:0]—Q[7:0] bus. Two low speed signals are provided from the controller to the MCPs in parallel. Chip enable CE# allows the ring to be powered down and clocks to be suspended for low power standby while reset R# initializes the devices on power up.

Figure 1
Fig. 1. HLNAND source synchronous ring topology.

A simplified schematic of the source synchronous bridge chip clocking and input/output circuitry is shown in Fig. 2. The actual circuit contains delay compensation in the PLL feedback path and dummy circuits for clock and data delay paths to provide matching essential for high speed operation. Input strobe signals DSI and CSI pass through to DSO and CSO through circuits similar to the D[7:0] to Q[7:0] path except the multiplexers to overwrite incoming signals with read data are not required.

Figure 2
Fig. 2. Source synchronous bridge chip clocking and I/O circuits.

Fig. 3 shows the inputs and outputs of a single MCP after power up. Edges of the DDR data inputs D[7:0] and outputs Q[7:0] are aligned with edges of the clock inputs CKI/CKI# and clock outputs CKO/CKO#. A differential clock input buffer provides the internal clock ckin with a delay tdi. Input data passes through a matched input buffer employing reference voltage vref to provide internal input data din with matched delay tdi. A PLL locked to 2x the input frequency is used to regenerate ckin for duty cycle correction and provide a 90Formula${^{\circ}}$ delay to create a sampling clock ckint centered in the din data valid window. A PLL rather than a DLL is used because the PLL filters jitter outside the loop bandwidth. With a DLL the jitter would accumulate around the ring. Output data and clocks are generated from edges of ckint with a delay of tdo. The total delay from input to output varies with operating conditions but is matched for data and clocks at Formula${\rm tdi}+{\rm td}90+{\rm tdo}+{\rm t}180$.

Figure 3
Fig. 3. HLNAND MCP signals after power-up.

To save power the PLL can be shut down in alternate MCPs by employing edge aligned clock and data between odd and even numbered MCPs and center aligned clock and data between even and odd numbered MCPs. The odd numbered MCPs have their PLLs shut down by a command from the controller which also reconfigures the inputs to be sampled directly with the received center-aligned input clock. The even numbered MCPs receive a command to reconfigure their outputs to provide center-aligned clock and data by inserting an additional 90° delay on the outbound clock. The center-aligned clock on the output of the even devices compensates for the disabled PLL in the odd devices. The signals at odd and even MCPs in PLL power saving mode are shown in Fig. 4

Figure 4
Fig. 4. HLNAND MCP signals with alternating PLLs disabled for reduced power.

In the odd devices the input sampling clock ckint is identical to the received input clock ckin. Although the delay through even and odd devices is different the average delay per stage remains Formula${\rm tdi}+{\rm td}90+{\rm tdo}+{\rm td}180$. After power-up and synchronization the controller can measure the total delay around the ring by observing the delay of the command strobe CSI. If there is a single device or an odd number of devices in the ring the controller does not require a DLL or PLL to create a 90° phase shifted sampling clock, because the return clock is already centered in the data valid window.



The internal architecture of the 32 GB MCP is shown in Fig. 5. The bridge chip implemented in a 90 nm CMOS logic process interfaces to eight 27 nm 32 Gb 133 Mb/s/pin toggle-mode MLC NAND flash devices. The bridge chip can accommodate standard asynchronous NAND and ONFI NAND in addition to toggle-mode by programming the appropriate bond option. Four separate internal channels allow independent bank data transfer operations to the bridge chip to better exploit the bandwidth capabilities of multiple flash devices. Each internal channel supports 2 NAND die within the 8 die stack, although a single die per internal channel with a 4 die stack is also an option.

Figure 5
Fig. 5. HLNAND MCP internal architecture.

The bridge chip includes page buffers for each channel to mirror the data in the local NAND page buffers. During an HLNAND page read operation a local page read command is issued to the selected NAND device followed by a burst data read command to load data into the bridge chip page buffer. The data is then available for a subsequent HLNAND burst data read command. Similarly, an HLNAND program command is translated to a local burst data load command to transfer page buffer contents to the selected NAND device followed by a local program command. In this way the operation of the bridge chip page buffer is transparent to the system and hidden within tR and tPROG intervals.

The external 800 Mb/s/pin HyperLink interface employs JEDEC standard un-terminated HSUL 1.2 v signaling with drivers calibrated within a range of 30–50 Formula$\Omega$ with an external ZQ reference resistor similar to DDR3 DRAM. The MCP requires 3 different supply voltages, 3.3 v for the NAND core, 1.8 v for the toggle mode NAND interface, and 1.2 v for the HyperLink HSUL interface.

Fig. 6 shows a de-encapsulated 100-pin 14 mm ×18 mm BGA package with 8 stacked 32 Gb MLC NAND Flash devices and the 800 MB/s bridge chip while Fig. 7shows the bridge chip measuring 5.07 mm ×2.28 mm. Since the flash devices are relatively narrow the bridge chip is placed directly on the package substrate. With larger NAND die the bridge chip can be placed on top of the NAND stack. NAND devices have pads along a single short side of the die. The devices are staggered to allow bond wires to connect to 4 die on the left and 4 die on the right. The outer bond pads of the bridge chip connect to two internal NAND channels on each side through the package substrate while the inner bond wires connect to package balls for the external HyperLink interface.

Figure 6
Fig. 6. De-encapsulated 100-pin BGA package with 8 stacked 32 Gb MLC NAND flash devices and 800 MB/s bridge chip.
Figure 7
Fig. 7. 800 MB/s HLNAND bridge chip.


A test board shown in Fig. 8 was developed to characterize the performance of a ring of 8 MCPs. A pseudo-random data source is connected through SMA connectors to fully exercise the channel and provide crosstalk. The 800 Mb/s data eye shown in Fig. 9 was measured at the output of the last device in the ring and shows good vertical opening and low timing jitter. A schmoo plot shown in Fig. 10 shows error free DDR834 operation with 417 MHz clock, 1.03 v supply voltage, and room temperature.

Figure 8
Fig. 8. 800 MB/s HLNAND ring test board.
Figure 9
Fig. 9. 800 Mb/s eye diagram from last device in the ring.
Figure 10
Fig. 10. Schmoo plot showing error free DDR834 operation at 417 MHz, 1.03 v, room temperature.


A custom 90 nm HLNAND bridge chip with PLL based DDR timing generation and self-calibrated driver impedance was developed to support 800 MB/s data transactions. The bridge chip interfaces to 8 stacked conventional NAND die within a single MCP to deliver higher performance and scalability than un-buffered NAND devices could provide. Table I summarizes the key features of the 256 Gb NAND Flash MCP. HLNAND provides a higher bandwidth channel than standalone NAND devices due to the point-to-point unidirectional ring architecture. Lower power is achieved through the use of 1.2 v un-terminated I/O, single point loads, and a hierarchical architecture where the NAND devices are physically isolated from the main channel. Using the conventional NAND interface a maximum of 8 die can be connected to a single controller channel. The HLNAND MCP allows 64 or more die to be supported by a single channel to enable cost-effective multi-TB SSDs.

Table 1


The authors thank Silicon Creations for PLL design, TSMC for chip fabrication, Winpac for package substrate design and assembly, Fidus for board design and assembly, and DA-Integrated for testing


P. Gillingham, D. Chinn, E. Choi, J.-K. Kim, D. Macdonald, H. Oh, and H.-B. Pyeon are with the Conversant Intellectual Property Management, Inc., Ottawa, ON K2K 2×1, Canada

R. Schuetz is with the Founder of a software startup focused on development of equity trading algorithms and financial data analysis

Corresponding author: P. Gillingham (

Color versions of one or more of the figures in this paper are available online at


No Data Available


Peter Gillingham

Peter Gillingham

Peter Gillingham joined Conversant, formerly MOSAID, in 1989, as a Manager for DRAM Development. He occupied a series of increasingly senior management positions until being named Chief Technology Officer in 2006. Prior to joining MOSAID, he was an IC Design Engineer with Mitel Corporation involved in the development of ISDN products. He received the B.Eng. degree in electrical engineering and the M.Eng. degree in electronics from Carleton University, and the M.Sc. degree in management from Stanford University. He served as an Associate Editor of the IEEE Journal of Solid State Circuits, as a Technical Program Committee Member for ISSCC and the VLSI Circuits Symposium, and a member of the JEDEC standards committee.

David Chinn

David Chinn

David Chinn joined Conversant, formerly MOSAID, in 2004, as an IC CAD Specialist. He worked along with his fellow Research and Development team members on various projects. Prior to joining MOSAID, he was an Analog IC Design Engineer and Design Kit Engineer with Nortel Networks. He received the M.Eng. degree in electrical engineering from the City University of New York, and the B.Eng. degree in electrical engineering from Fudan University, China.

Eric Choi

Eric Choi

Eric Choi was born in Suwon, Korea, in 1969. He received the B.S. degree in electronic engineering from Hanyang University, Korea, in 1992, and the M.S. degree in electronic engineering from Seoul National University, Korea, in 1994.

He was with SK Hynix Semiconductor, Icheon, Korea, from 1994 to 2009, where he developed DRAM process and designed high speed graphics DDR SDRAMs. He joined Conversant, formerly MOSAID, in 2010. He developed HLNAND devices and PHY for HLNAND systems.

Jin-Ki Kim

Jin-Ki Kim

Jin-Ki Kim was born in Taegu, Korea, in 1964. He received the B.S. degree in electronic engineering from Yonsei University, Korea, and the M.B.A. degree from Queen's University, Ontario, Canada, in 1986 and 2010, respectively.

He joined the Memory Division of Samsung Electronics Corporation, Kiheung, Korea, in 1985, where he was involved in the circuit design of 64 Kb, 256 Kb, and 1 Mb EEPROM's. From 1990 to 1997, he led NAND Flash design groups for 8 Mb, 16 Mb, and 64 Mb, 128 Mb MLC, and nor Flash design groups for 8 Mb and 16 Mb. In 1997, he joined Conversant Intellectual Property Management, formerly MOSAID Technologies. He led design groups for 256 Mb DDR2 DRAM, 9 Mb DCAM, embedded DRAM, HypeLink NAND Flash. Since 2007, he has been a Vice President of the Technology Research and Development Group.

No Photo Available

Don Macdonald

Don Macdonald was born in New Brunswick, Canada, in 1961. After completing an Engineering Technology Diploma, he joined MOSAID Technologies working for the Systems group supporting the MOSAID Tester design effort. In 1991, he was a Custom IC Layout Designer with MOSAID for the Semiconductor Group. In 2005, he joined ST Microelectronics and gained experience doing Place and Route IC Layout. Returning to MOSAID in 2008, he worked in the R&D group combining custom IC layout and ASIC style layout.

Hakjune Oh

Hakjune Oh

Hakjune Oh has over 23 years of semiconductor memory product experience with SK Hynix and Conversant. He was working on the research and developing high speed and low cost solid state storage system based on the high speed DDR-800 HLNAND products and NAND Flash control logic. Prior to joining Conversant, he led various memory IC design engineering teams at SK Hynix. He holds over 200 U.S. and worldwide issued and pending patents for his work in the leading edge technologies in DRAM and NAND Flash memories. He received the B.Sc. degree in electrical engineering from Yonsei University.

Hong-Beom Pyeon

Hong-Beom Pyeon

Hong-Beom Pyeon was born in Anmyeon, Korea, in 1964. He received the B.S. and M.S. degrees in electronic engineering from Chung-Ang University, Seoul, Korea, in 1987 and 1989, respectively.

In 1989, he joined LG semicon, Seoul, as a Memory Test Engineer, and became a Memory Design Engineer in 1991. In 1998, he was a Principal DRAM Design Engineer leading technical staffs. In 2000, he joined Mosaid Technologies, Inc., Ottawa, ON, Canada, as a Senior IC Design Engineer, where he was involved in DCAM and DRAM designs. Since 2012, he has been the Director of Research and Development responsible for developing HLNAND technology and PHY designs along with researching new emerging memories.

Roland Schuetz

Roland Schuetz

Roland Schuetz is currently founder of a software startup focused on the development of equity trading algorithms and financial market data analysis. He held successive positions at MOSAID (now Conversant) including Systems Architect, Senior Systems Architect, and Director of Application and Business Initiatives working on the development of the HLNAND Flash memory standard and SSD architecture. He holds several patents. He was with ATI (now AMD) developing DRAM system-memory controllers and graphics processors in their Integrated Graphics Chip Set Division where he helped to pioneer ATI's Chip Set business. His entrepreneurial history includes involvement in the virtual-reality gaming industry and running a sports related website. He received the B.Sc. degree in electrical engineering from the University of Toronto's Engineering Science Program.

Cited By

No Data Available





No Data Available
This paper appears in:
No Data Available
Issue Date:
No Data Available
On page(s):
No Data Available
INSPEC Accession Number:
Digital Object Identifier:
Date of Current Version:
No Data Available
Date of Original Publication:
No Data Available

Text Size

Comment Policy
comments powered by Disqus