https://www.youtube.com/watch?v=5ZlavKF_98U

Problem trying to solve: How to serve LLM efficient and cost effectively.

Inference process of LLM