image.png

We can abstract the memory hierarchy as follows: CPU, Cache, and Main Memory.

Our goal is to utilize the size of main memory but speeding it up so it “feels” like speed of using Registers.

Some Terminologies

Main Memory

Main memory (also called as RAM) is the last level of memory the CPU will check when looking for data. If the data is not found in the cache, the CPU fetches it from main memory.

Block

block (or line) is the smallest unit of data in the cache. It is associated with a tag, which helps identify the corresponding part of the main memory.

Set

A set is a collection of blocks that can be checked in parallel. If a cache has only one block per set, it is called a direct-mapped cache. A set-associative cache has multiple blocks per set, allowing more flexibility in placing data.

Associativity

Associativity refers to the number of blocks in a set. A direct-mapped cache has one block per set, meaning each memory block has a single location in the cache. Higher degrees of associativity allow for multiple possible locations for each memory block, improving cache performance: M-way set associative, Fully associative.

Fetch Size

The fetch size is the maximum amount of memory that can be fetched from the next memory level (such as from main memory to cache). It is typically a multiple of the sub-block size and can be larger or smaller than the block size.

Hit Ratio

The hit ratio is the proportion of cache accesses that result in a cache hit (i.e., the requested data is found in the cache).

Miss Ratio

The miss ratio is the proportion of cache accesses that result in a cache miss (i.e., the requested data is not found in the cache).

Miss Penalty

The miss penalty is the time required to fetch a block from main memory to the cache when a cache miss occurs. This includes the time to access main memory and any additional delays from memory hierarchy levels.