https://youtu.be/-amEtpnuJ70?si=FxN0ppAtxOsLFNiv
MLOps
-
gpu server, infra, cluster management
- 큰 회사 (e.g. 1500 server * 8 GPU)
- 어떤 서버가 비어있는지
- 효율적으로 잘 사용하는지
- nvidia driver, cuda 설치 여부
- os 관리
등등의 효율화가 잘 안 됨
-
mlops pipeline, system dev
- change jupyter notebook proof of concept to real running python scripts
-
serving infra, server

- model optimization
- posttraining modules
- gpu speculation, serving system development
- most interaction w ai engineer and sw engineer