CUDA Thread Heirarchy

pay attention to the left image!

pay attention to the left image!

Key abstraction of the CUDA Programming model, alongside the memory hierarchy.

threads

this is the lowest level in the CUDA Thread hierarchy.

executes a stream of instructions.

hardware resources that effect arithmetic and logic instructions are called Cores (or Pipes). Note that each Core runs a single thread.

Warp Scheduler selects which thread the core should execute.

HW: threads execute on individualcores

blocks (= thread blocks = CTA; Cooperative thread arrays)

fyi) the term CTA is used in the context of PTX/SASS but basically means the same as blocks or thread blocks.

Each thread has a unique index-based identifier within its thread blocks. This makes assigning work to individual threads easier.

All threads within a block are scheduled simultaneously onto the same SMs by warp scheduler. Since they share the same L1 Cache Memory, they can coordinate through shared memory and synchronized with barriers.

WARNING: Shared Memory has NO RELATION w/ Streaming Multiprocessor (SM)