INTRO

These terms and technologies are physical components of the GPU (or the device)

Terms

CUDA Device Architecture

CUDA = Compute Unified Device Architecture

History

GPUs before were designed for highly specific graphical tasks (e.g. vertex calculation, pixel calculation etc.). It was hard for software programmers to create application using those fixed complex pipelines.

CUDA (= unified architecture for CPU(host) and GPU(device)) made GPU programming more accessible (GPGPU: General Process GPU) to execute calculation that can take advantage of Parallel Computing.

more information about the history of CUDA hardware architecture in this blog post.

SM(Streaming Multiprocessor)

![internal architecture of an H100 SM

GPU Core(e.g. CUDA Core, Tensor Core): Green
SFU(Special Function Unit): Maroon (빨간색)
LSU (Load Store Unit): Pink
Scheduling Units(Warp Scheduler and Dispatch Unit): Orange
Memory: Blue](attachment:8f9ae137-d917-44bf-afed-aa8ddd659396:image.png)

internal architecture of an H100 SM

GPU Core(e.g. CUDA Core, Tensor Core): Green
SFU(Special Function Unit): Maroon (빨간색)
LSU (Load Store Unit): Pink
Scheduling Units(Warp Scheduler and Dispatch Unit): Orange
Memory: Blue

A core computational unit within a GPU that maximizes parallel processing by executing a large number of threads concurrently.

한국어로 하면, 병렬 연산을 통해 수많은 thread를 동시에 실행하여 처리 속도를 극대화하는 GPU의 핵심 연산 장치

A GPU device (like an NVIDIA A100, RTX 4090, etc.) is built from multiple Streaming Multiprocessors (SMs).