Job Description
Position: DevOps / MLOps Engineer
Experience Level: Mid-level to Senior
We are looking for a DevOps / MLOps Engineer to design, operate, and scale the infrastructure for running heavy AI models for digital content generation. This role is focused on GPU-based inference, on-demand execution, and cost/performance optimization, and sits at the practical intersection of MLOps and multimodal AI systems.
Given the growing global and local demand for fast and affordable content generation, this role has a direct and measurable impact on the final product.
Responsibilities
- Design and implement end-to-end inference-focused pipelines for AI-based digital content generation
- Build automation for GPU job execution, including:
- On-demand instance start/stop
- Scheduling and queue management
- Run and optimize heavy AI models on GPU infrastructure
- Monitor GPU utilization, memory, and system resources to maximize efficiency and control costs
- Collaborate closely with the AI team to test models and review outputs and artifacts
- Document the infrastructure and continuously improve pipeline reliability
- Propose and implement scale-up / scale-out and multi-GPU optimization strategies
Required Skills & Qualifications
- Hands-on experience with GPU cloud providers (Vast.ai, RunPod, Lambda Labs, or similar)
- Strong experience with Docker and GPU-based containerized workloads
- Working knowledge of Kubernetes for running and managing GPU workloads
- Experience building or operating ML inference pipelines (multimodal experience is a plus)
- Strong understanding of GPU optimization and memory management
- Familiarity with CI/CD concepts and workflows
- Experience with or exposure to monitoring tools such as Prometheus, Grafana, or equivalents
- Strong sense of ownership, structured thinking, and ability to work collaboratively
Nice to Have
- Experience with job orchestration tools (Airflow, Prefect)
- Experience reducing cold start latency in GPU-based services
- Proven GPU cost optimization experience
- Infrastructure documentation using MkDocs, Sphinx, or similar tools
What We Offer
✨ Hands-on work with real-world multimodal AI systems
✨ Flexible working hours and a startup-friendly environment
✨ Direct impact on a production AI product
✨ Clear growth path and exposure to real MLOps challenges
✨ Learning budget for courses and conferences
✨ Collaboration with a fast, pragmatic, and technical team
✨ ESOP (equity incentives) for high-impact contributors