01. Vector Addition

how doestl.program_id work?

→ Returns the id of the current program instance along the given axis.
- what is axis?
- how, when, and what assigns program id(pid) into the GPU hardware
how does tl.arange(0, BLOCK_SIZE) work?
- arange works similar how NumPy works; takes range from start arg to finish arg (which in this case would be 0 ~ 1023)
how does mask work?
how does GPU Programming differ from CPU Programming?
grid = **lambda** meta: (triton.cdiv(n_elements, meta['BLOCK_SIZE']), ) → wtf
add_kernel[grid](x, y, output, n_elements, BLOCK_SIZE=1024) how does this syntax work

Before undrestanding the code, we need to understand some basic structure/architecture of GPU and how code is matched with that.

Grid = Collection of Blocks

grids : 1 2 or 3