CS 2: Caches
Deep dive into caches and memory hierarchy.
  • CS
  • Cache
  • Memory
February 22, 2026
7 min read

What is Cache?

A smaller, faster storage device that acts as a staging area for a subset of the data in a larger, slower device.


Fundamental idea of a memory hierarchy

For each k, the faster, smaller device at level k serves as a cache for the larger, slower device at level k + 1.
To be precise, since most accesses are solved at level k, k + 1 is used less often and does not need to be very fast.

Why do memory hierarchies work?

Because of locality, programs tend to access the data at level k more often than they access the data at level k + 1.
Thus, the storage at level k + 1 can be slower, and thus the larger and cheaper per bit.

Usage of memory hierarchy?

It creates a large pool of storage that costs as much as the cheap storage near the bottom, but that serves data to programs at the rate of the fast storage near the top.

The memory hierarchy

Register

Register is located inside CPU and holds the values that CPU is currently working on.

Cache

Caches can be split into three different type: L1, L2, and L3.



These all L1, L2, and L3 caches are made of so-called SRAM.

SRAM?

SRAM stands for Static Random Access Memory.
RAM is computer's main memory. Let's break down why it's called Static RAM. SRAM uses flip-flop (a small circuit of transistors) to store each bit.
Once a bit is stored, it stays there as long as power is on, without needing to be refreshed.
That's why it's called "static", it does not change unless CPU writes to it.

Main Memory

Main memory is the computer's primary workspace where programs and data live while the computer is running.

DRAM?

DRAM stores each bit as a tiny capacitor. It leaks charge over time, so each bit must be refreshed to keep its value. That's why it's called dynamic, it's constantly changing unless refreshed.

Understanding CPU → Cache → RAM Flow

When a program runs, the CPU constantly reads and writes data from memory.
However, RAM is much slower than the CPU.

To reduce this speed gap, computers use cache memory — a small, very fast memory located close to the CPU.


Step 1: CPU Requests Data

When the CPU needs data:

Why?
Because cache is much faster than RAM.


Step 2: How the Cache is Checked

The method depends on the cache architecture.

1) Direct Mapped Cache

Analogy:
"This book must always go on shelf #5."


2) Fully Associative Cache

Analogy:
"You can put the book on any empty shelf."


3) Set-Associative Cache

Analogy:
"Your book belongs in Section A, but you can choose any shelf inside Section A."


Step 3: Cache Hit vs Cache Miss

Cache Hit ✅

Cache Miss ❌


Why This Works

Programs tend to reuse data frequently (this is called locality).

Because of locality:

This design allows computers to:

🔹 Cache Line Structure

Each cache line (a single “slot” in the cache) has three parts:

| Part | Meaning | |-----------|---------| | Valid Bit | 1 if the line contains valid data, 0 if empty | | Tag | Helps identify which RAM block this line belongs to | | Data Block (2^b bytes) | The actual data stored, size depends on block size |

Key point: The CPU only cares about addresses and bytes.
It reads as many bytes as the data type size (int = 4 bytes, char = 1 byte), but cache just provides bytes starting from the offset.

🔹 Address Breakdown

For a memory address, we split it like this: | Tag | Set Index | Block Offset |

Example:

Tag = 01, Set Index = 1, Block Offset = 1

🔹 Cache Simulation Example (Direct-Mapped)

Setup:

Hit/Miss Result:

| Address | Set | Tag | Hit/Miss | |---------|-----|-----|----------| | 0 | 00 | 0 | Miss | | 1 | 00 | 0 | Hit | | 7 | 11 | 0 | Miss | | 8 | 00 | 1 | Miss | | 0 | 00 | 0 | Miss |

Notes:


🔹 Cache Simulation Example (2-Way Set-Associative)

Setup:

Step-by-step Hit/Miss:

| Address | Set | Tag | Hit/Miss | |---------|-----|-----|----------| | 0 | 0 | 00 | Miss | | 1 | 0 | 00 | Hit | | 7 | 1 | 01 | Miss | | 8 | 0 | 10 | Miss | | 0 | 0 | 00 | Hit |

Notes: