What is Cache?
A smaller, faster storage device that acts as a staging area for a subset of the data in a larger, slower device.
Fundamental idea of a memory hierarchy
For each k, the faster, smaller device at level k serves as a cache for the larger, slower device at level k + 1.
To be precise, since most accesses are solved at level k, k + 1 is used less often and does not need to be very fast.
Why do memory hierarchies work?
Because of locality, programs tend to access the data at level k more often than they access the data at level k + 1.
Thus, the storage at level k + 1 can be slower, and thus the larger and cheaper per bit.
Usage of memory hierarchy?
It creates a large pool of storage that costs as much as the cheap storage near the bottom, but that serves data to programs at the rate of the fast storage near the top.
The memory hierarchy
Register
Register is located inside CPU and holds the values that CPU is currently working on.
Cache
Caches can be split into three different type: L1, L2, and L3.
- L1 Cache (Level 1)
- Built inside the CPU
- Very small (~32–64 KB)
- Extremely fast
- Holds instructions or data the CPU is using right now
- L2 Cache (Level 2)
- Slightly bigger (~256 KB–1 MB)
- Slower than L1 but still fast
- Stores data/instructions that may be needed soon
- L3 Cache (Level 3)
- Shared among CPU cores (~4–32 MB)
- Slower than L2, bigger
- Stores data that might be reused by any core
These all L1, L2, and L3 caches are made of so-called SRAM.
SRAM?
SRAM stands for Static Random Access Memory.
RAM is computer's main memory. Let's break down why it's called Static RAM. SRAM uses flip-flop (a small circuit of transistors) to store each bit.
Once a bit is stored, it stays there as long as power is on, without needing to be refreshed.
That's why it's called "static", it does not change unless CPU writes to it.
Main Memory
Main memory is the computer's primary workspace where programs and data live while the computer is running.
- It's larger than registers and caches.
- Usually made of DRAM (Dynamic RAM)
- Volatile: Losses data when power is off.
DRAM?
DRAM stores each bit as a tiny capacitor. It leaks charge over time, so each bit must be refreshed to keep its value. That's why it's called dynamic, it's constantly changing unless refreshed.
Understanding CPU → Cache → RAM Flow
When a program runs, the CPU constantly reads and writes data from memory.
However, RAM is much slower than the CPU.
To reduce this speed gap, computers use cache memory — a small, very fast memory located close to the CPU.
Step 1: CPU Requests Data
When the CPU needs data:
- It requests a specific memory address in RAM.
- But before accessing RAM, it checks the cache first.
Why?
Because cache is much faster than RAM.
Step 2: How the Cache is Checked
The method depends on the cache architecture.
1) Direct Mapped Cache
- Each RAM address maps to exactly one specific cache block.
- The CPU checks only that one block.
- Simple and fast.
- But collisions can occur.
Analogy:
"This book must always go on shelf #5."
2) Fully Associative Cache
- Any RAM address can be stored in any cache block.
- The CPU may search the entire cache.
- More flexible.
- More complex hardware.
Analogy:
"You can put the book on any empty shelf."
3) Set-Associative Cache
- Each RAM address maps to one specific set.
- Within that set, the data can be stored in multiple blocks.
- The CPU searches only within that set.
- A balance between the two extremes.
Analogy:
"Your book belongs in Section A, but you can choose any shelf inside Section A."
Step 3: Cache Hit vs Cache Miss
Cache Hit ✅
- Data is found in cache.
- CPU uses it immediately.
- Very fast.
Cache Miss ❌
- Data is not in cache.
- CPU fetches it from RAM.
- The data is copied into the cache.
- Then the CPU uses it.
- Slower than a hit.
Why This Works
Programs tend to reuse data frequently (this is called locality).
Because of locality:
- Frequently used data stays in fast cache.
- Less-used data stays in larger, slower RAM.
This design allows computers to:
- Feel as fast as small memory
- But store as much data as large memory
🔹 Cache Line Structure
Each cache line (a single “slot” in the cache) has three parts:
| Part | Meaning | |-----------|---------| | Valid Bit | 1 if the line contains valid data, 0 if empty | | Tag | Helps identify which RAM block this line belongs to | | Data Block (2^b bytes) | The actual data stored, size depends on block size |
- b = number of block offset bits → tells CPU which byte inside the block to read
- Example: if
b = 3→ 2³ = 8 bytes/block → offset 0~7
Key point: The CPU only cares about addresses and bytes.
It reads as many bytes as the data type size (int = 4 bytes, char = 1 byte), but cache just provides bytes starting from the offset.
🔹 Address Breakdown
For a memory address, we split it like this: | Tag | Set Index | Block Offset |
- Block Offset (b bits) → which byte inside the block
- Set Index (s bits) → which set in the cache the block belongs to
- Tag (t bits) → identifies which block in RAM this is
Example:
- Address = 7 (0111)
- Cache block size = 2 bytes → b = 1
- 2 sets → s = 1
- Remaining bits → tag = 2
Tag = 01, Set Index = 1, Block Offset = 1
🔹 Cache Simulation Example (Direct-Mapped)
Setup:
- Memory = 16 bytes → 4-bit addresses
- Block size = 2 bytes → b = 1
- Sets = 4 → s = 2
- 1 block per set → Direct-Mapped
- Address trace: 0, 1, 7, 8, 0
Hit/Miss Result:
| Address | Set | Tag | Hit/Miss | |---------|-----|-----|----------| | 0 | 00 | 0 | Miss | | 1 | 00 | 0 | Hit | | 7 | 11 | 0 | Miss | | 8 | 00 | 1 | Miss | | 0 | 00 | 0 | Miss |
Notes:
- Overwriting a block in a set = collision
- Cache Hit: data already in cache → fast
- Cache Miss: data not in cache → fetch from RAM → slower
🔹 Cache Simulation Example (2-Way Set-Associative)
Setup:
- Memory = 16 bytes → 4-bit addresses
- Block size = 2 bytes → b = 1
- Sets = 2 → s = 1
- Lines per set = 2 → 2-way set associative
- Address trace: 0, 1, 7, 8, 0
Step-by-step Hit/Miss:
| Address | Set | Tag | Hit/Miss | |---------|-----|-----|----------| | 0 | 0 | 00 | Miss | | 1 | 0 | 00 | Hit | | 7 | 1 | 01 | Miss | | 8 | 0 | 10 | Miss | | 0 | 0 | 00 | Hit |
Notes:
- Each set can hold 2 blocks, so collisions are less frequent than Direct-Mapped
- Cache keeps track of valid bit + tag + data block
- Block offset tells where inside the block to read the byte