LLM 평가 전략(GuideLLM, lm-eval-harness, OpenAI 평가 워크숍) — Taylor Jordan Smith

concurrent user request

inference engine

Model inference performance assessment is time consuming and fragmented.

Guaranteeing a model and hardware profile are sufficient to maintain Inference Service Level Objectives (SLOs) while scaling.

Cost Estimation for real-world workloads is often a mystery and requires backwards math mapping inference performance to tokens to cost.

Bias

“glue” incident

synthetic, ai-generated data

How to prevent issues at scale