값비싼 Diffusion model을 받드는 저비용 MLOps

단점

GAN 모델과 달리 Diffusion은 여러번 denoising 과정을 거치기 때문에 (about 50 times in inference time) 느리다!

FlashAttention은 Attention에서 발생하는 IO bottleneck을 줄이는 방식