3. Pruning and Sparsity | Notion

Pruning

Make neural network smaller by removing synapses and neurons

주로 불필요한 가중치를 제거(0으로 만듦)해서 네트워크를 가지치기 하는 것

$$ \arg \min_{W_p} L(x; W_p) \\ \text{subject to } ||W_p||_0 < N $$

Pruning Granularity

Fine-grained / Unstructured

flexible pruning indices
usually larger compression ratio since we can flexibly find “redundant” weights
can deliver speed on some hardware but not GPU

Coarse-grained/Structured/Pattern-based

N:M sparsity: in each contigous(연속적인) M elements, N of them is pruned
supported by NVIDIA’s Ampere GPU Architecture which delivers up to 2x speed up
usually maintains accuracy