Attention is All You Need
Reference
https://nlp.seas.harvard.edu/annotated-transformer/
KV Cache
transformer 궁금한 점 정리
어텐션 쉽게 이해하기Attention is easy to understand.
prefill vs decode