On chip vs Off chip

On chip Memories are memories that reside inside the chip.

Off chip memories reside outside the chip.

Usually On chip memories have much smaller storage than Off Chip memories, because making a storage inside a chip is a critical design choice and an expensive move.

On chip memories are faster to read and write because the access time for the hardware to execute the command is much faster.

reference) a linkedin post about on chip memory hierarchy:

What is on chip memory hierarchy in semiconductor? | Manish Goyal posted on the topic | LinkedIn

Types of Memory

The Blueprint

In CUDA/GPU programming, the GPU provides several distinct memory spaces, each with different characteristics regarding its location (on-chip vs. off-chip), access speed, scope (which threads can see it), and lifetime. Understanding this hierarchy is crucial for writing high-performance code.

There are total 5 types of memory when we discuss about GPU/CUDA memory: Global Memory, Registers, Constant Memory, Shared Memory, and Local Memory.