https://www.youtube.com/watch?v=89NuzmKokIk

concurrent user request
inference engine
Model inference performance assessment is time consuming and fragmented.
Guaranteeing a model and hardware profile are sufficient to maintain Inference Service Level Objectives (SLOs) while scaling.
Cost Estimation for real-world workloads is often a mystery and requires backwards math mapping inference performance to tokens to cost.
Bias
“glue” incident
synthetic, ai-generated data
