Attention is All You Need

Reference

https://nlp.seas.harvard.edu/annotated-transformer/

KV Cache

transformer 궁금한 점 정리

어텐션 쉽게 이해하기Attention is easy to understand.

prefill vs decode