Path: bloom-picayune.mit.edu!snorkelwacker.mit.edu!americast.com!americast.com\!americast-post Newsgroups: americast.ieee From: americast-post@AmeriCast.Com Organization: American Cybercasting Approved: americast-post@AmeriCast.com Subject: Synchronous dynamic RAM Date: Wed, 4 Nov 92 10:20:19 EST Message-ID: <1049.art.1992Nov4.102019@AmeriCast.com> Synchronous dynamic RAM Betty Prince, Roger Norwood, Joe Hartigan, Wilbur C. Vogley Texas Instruments Inc. Data rates of 500 megabytes per second are expected for a dynamic RAM with a new architecture, new interfaces, and a new name: the synchronous dynamic RAM. In other words, the 100-200-MHz microprocessor of the mid-1990s could function nonstop if this kind of dynamic RAM (DRAM) were used for main memory. In addition, in simple cases, main memory might once again handle special functions like graphics, which currently need specialty DRAMs to achieve high enough data rates. Yet the synchronous DRAM is an evolutionary approach to high-speed memory, in that it ties together, in a single device, many of the diverse techniques developed over the years to augment memory transfers. The plan is for it to become a multi- sourced, commodity chip. SPEED MISMATCH. Standard DRAMs and the new microprocessors have increasingly diverged in speed. Basic DRAM architecture has not changed since the early 1970s, whereas design and architectural innovations have improved microprocessors to the point where speeds of 500 MHz or more look possible by the end of the decade. DRAMs, on the other hand, currently take about 60 ns to access a row address, or 30 ns from address to data available in page mode, which is about a 25-MHz cycle time. The problem is compounded by the increasing density of dynamic RAMs, which further adds to the time needed to access the total memory through a memory bus of a given width. CHIP SOLUTIONS. Still, as DRAM chips have become larger, chip-level solutions have begun to appear. The cache, for example, can be included on the DRAM. Fast-access modes such as page mode, static column mode, or nibble mode have helped to nearly double basic DRAM speed. Page and static column modes work by keeping the active row data latched in the sense amplifiers, and merely selecting new random column addresses. Nibble mode uses wider internal architectures (by 4) and fast registers to make a 4-bit parallel-to-serial conversion. The result is fast access to 4 consecutive bits after a random address. Although these modes can enlarge DRAM bandwidth to twice its normal, non-mode width, they still fall far short of today's processor needs. Wider external data buses (8 or 16 bits wide, for example) can also allow more information to be retrieved from the DRAM on a single access. The penalties here are larger chips and packages and greater DRAM output noise, all of which tend to slow down the read access. NONPROPRIETARY. In the synchronous DRAM, many manufacturers, including Texas Instruments, Samsung, Hitachi, Toshiba, Mitsubishi, and NEC are addressing these speed concerns in an evolutionary, nonproprietary approach. Initially, they are developing synchronous DRAMs of 16M-bit capacity. These are currently being considered for open standardization in the Electronic Industries Association/Joint Electron Devices Engineering Council (EIA/Jedec) JC42.3 DRAM Standards Committee. (While synchronous static RAMs have been around for some years, the idea of using a dynamic RAM synchronously has only recently gained acceptance.) Historically, DRAMs have been controlled asynchronously: the processor presents addresses and control levels to the memory, indicating that a set of data at a particular location in memory should be either read from or written into the DRAM (for a write, the processor also presents the data it wants written). After a delay--the access time--the RAM either writes the new information from the processor into its memory or else provides the information on its outputs for the processor to read. During the access-time delay, the DRAM performs various internal functions, such as activating the high capacitance of the word lines and bit lines, sensing the data, and routing the data out through the output buffers. The delay creates a wait state for the processor; it simply waits for the RAM's response, and the entire system slows down as a result. Under synchronous control, on the other hand, the DRAM latches information in and out under the control of the system clock. The processor drops off the instructions for the DRAM in a set of latches, which stores the addresses, data, and control signals on the DRAM inputs until the memory can process the request. The DRAM responds after a set number of clock cycles, which can be programmed by the user in a special configuration cycle. NO WAITING. Since the processor knows how many clock cycles it takes for the DRAM to respond, it can safely go off and do other tasks while the RAM is processing its requests. An example is a DRAM that has a 60-ns read delay after initial addressing and is operated with a 10-ns (100-MHz) clock. If the RAM is asynchronous, the processor waits the full 60-ns access time for the information. But if the DRAM is synchronous, the processor can strobe the addresses into a set of input latches and do other tasks while the memory does the read operation. Then, when the processor clocks the outputs of the RAM six cycles (60 ns) later, the data it wants is there. In the synchronous DRAM timing diagram, the row address is strobed in on the rising edge of the system clock, activating a row or word line of memory bits. A column address is then clocked in after three clock cycles (30 ns), sufficient time to activate the word line. The byte of data then appears on the outputs after three more cycles (another 30 ns) to decode the new address and get the data from the sense amplifiers through the output buffers. Six clock cycles after the row address has bee n clocked in, the processor can expect the requested information from the output buffers of the synchronous DRAM. The architecture of the synchronous DRAM can further shorten its average access time by pipelining addresses. The input latch stores the next address the processor will want while the RAM is operating on the previous address. Normally the addresses to be accessed are known several cycles in advance by the processor; therefore, after the address for the first access has been sent and the RAM has started working on it, the processor can send the following address to the input latch, so that it is available as soon as the first address has moved on to the next stage of processing. The processor need not wait a full access cycle before starting its next access to the DRAM. Various techniques can also be used inside the synchronous DRAM to hide the components of the internal timing delay. The address setup time and word and bit line precharge time can be eliminated after the first access by using a burst mode, in which a series of data bits can be clocked out rapidly after the first bit has been accessed. The burst mode in the synchronous DRAM is similar to the old nibble mode, in which 4 bits of sequential data are provided in rapid succession without inputting new address information to the DRAM--but now as much as a full page of data can be provided [Fig. 1]. Burst mode is only useful if all the bits to be accessed are in sequence and in physically the same row of cells as the initial access. Likely applications include high-speed memory functions such as video support requiring 100-MHz data rates. Burst mode can be combined with a "wrap" feature. The wrap gives access, for example, to strings of bits stored both before and after the initial bit location in rapid succession after the initial access. This feature is useful for cache filling, since the most likely bits to be wanted next are those physically close to the current bit. The user can program both the type of wrap and the number of bytes available on each wrap. If data from different rows is needed, it is still possible to hide some of the row precharge time if the two rows lie in different banks. With the synchronous DRAM, the multiple-bank interleaving previously done at the system level may be moved onto the chip. Then one bank can be precharged while another is being accessed. An example is the "nibbled-page" architecture of Tokyo's Toshiba Corp., in which the data from 8-bit sections of different columns is interleaved on-chip to give byte-level random access at a 100-Mb/s rate. Another advantage of multiple-bank synchronous DRAMs is that the active rows (potentially one in each bank) may serve as a cache. In more detail, once a given row in a given bank is accessed, it is held active and may be accessed again simply by supplying a new column address. This method has been used in page-mode devices, but had only limited success because only one row in an asynchronous DRAM could be held active. ADDING ASSETS. All these features--burst and wrap modes and interleaved banks--can be combined on a single synchronous DRAM that runs at up to 400-500 MHz. And still more features may be added. For example, a data mask control can be used as an enable/disable pin to ignore inputs or turn the outputs off for a single clock cycle. This could be useful, especially in write cycles, if the user wants to access a string of bits, but not all in the string. Another feature, clock enable/disable, turns off the system clock inside the RAM, thereby suspending the device in its current state, or puts it into low-power standby mode, saving energy in battery-powered equipment. Synchronous DRAMs need a refresh cycle, since they are still composed of dynamic cells that lose their charge. But another option, self-refresh, appears on many of the newer byte-wide DRAMs. This new, simpler refresh mode is completely controlled on the chip and retains data in a low-power mode. The self- refresh option is expected to be available on synchronous DRAMs from many vendors. The final speed attained by this kind of DRAM in a system depends not only on its internal architecture but also on the interface signal levels it uses. A 5- or 3.3-V interface can function smoothly in, say, a desktop computer running at 50 MHz. If, how-ever, these same interfaces, with their relatively large 2-V output swings (0.4 V low to 2.4 V high), run in a 125-MHz system, unterminated lines longer than a few centimeters could cause delays. Therefore, higher-speed synchronous DRAMs will have low-swing interfaces such as Gunning transceiver logic (GTL) or center-tap-terminated (CTT) ["Fast interfaces for DRAMs," pp. 54- 57] to compensate for transmission-line effects. SHORTER LEADS. The high speed of synchronous DRAMs influences their packaging. Most appropriate for them are new miniature packages such as the thin small-outline package (TSOP) or various vertical surface-mount packages. These reduce the effective length of the package wiring and leads, and devices may be tightly spaced on a circuit board. Advances in fast DRAM architecture, high-speed interfaces, and miniature packaging combine in the synchronous DRAM into a widely sourced device type that could become the next-generation commodity DRAM. Whether the promise is fulfilled depends on several factors: producers must standardize their products and they must produce them in the high volume needed to bring costs down. No less important, many mainstream DRAM manufacturers must offer synchronous DRAMs so that users may have alternative sources to rely on Copyright 1992, IEEE Spectrum. For more information, send-email to American Cybercasting Corporation (usa@AmeriCast.COM)