Even so, the L3 access time is usually at least five times faster than a DRAM access. On-chip, cache SRAMs are normally organized with width that matches the block size of the cache, with tags stored in parallel to each block.

This allows an entire block to be read out or written in a single cycle. This capability is particularly useful when writing data fetched from a miss into the cache or when writing back a block that must be evicted from the cache.

The access time to the cache (ignoring the hit detection and selection in a set associative cache) is proportional to the number of blocks in the cache, whereas the energy consumption depends both on the number of bits in the cache (static power) and on the number of blocks (dynamic power). Associative caches reduce the initial access time to the memory because the size of the memory is smaller, but increase the time for hit detection and block selection, a topic we will cover in Section 2.

DRAM Technology As early DRAMs grew in capacity, the cost of a package with all the necessary address lines was an issue. Modern DRAMs are organized in banks, up to 16 for DDR4. Each bank consists of a series of rows.

Sending an ACT (Activate) command opens a bank and a row and loads the row into a row buffer. When the row is in the buffer, it can be transferred by successive column addresses at whatever the width of the DRAM is (typically 4, 8, or 16 bits in DDR4) or by specifying a block transfer and the starting address.

The Precharge command (PRE) closes the bank and row and readies it for a new access. Each command, as well as block transfers, are synchronized with a clock. See the next section discussing SDRAM. The row and column signals are sometimes called RAS and CAS, based on the original names of the signals.

One-half of the address is sent first during the row access strobe (RAS). The other half of the address, sent during the column access strobe (CAS), follows it. The names come from the internal chip organization, because the memory is organized as a rectangular matrix addressed by rows and columns.

An additional requirement of DRAM derives from the property signified by its first letter, D, for dynamic. To pack more bits per chip, DRAMs use only a single transistor, which effectively acts as a capacitor, to store a bit.

On reading, a row is placed into a row buffer, where CAS signals can select a portion of the row to read out from the DRAM. Because reading a row destroys the information, it must be written back when the row is no longer needed. This write back happens in overlapped fashion, but in early DRAMs, it meant that the time before a new row could be read was larger than the time to read a row and access a portion of that row.

Fortunately, all the bits in a row can be refreshed simultaneously by reading that row and writing it back. Therefore every DRAM in the memory system must access every row within a certain time window, such as 64 ms. DRAM controllers include hardware to refresh the DRAMs periodically. This requirement means that the memory system is occasionally unavailable because it is sending a signal telling every chip to refresh.

The time for a refresh is a row activation and a precharge that also writes the row back (which takes 2 cycles).

Because the memory matrix in a DRAM is conceptually square, the number of steps in a refresh is approximately the square root of the DRAM capacity. So far we have presented main memory as if it operated like a Swiss train, consistently delivering the goods exactly according to schedule.

In fact, with SDRAMs, a DRAM controller (usually on the processor chip) tries to optimize accesses by avoiding opening new rows and using block transfer when possible. Refresh adds an unpredictable factor.



