About the Role
We’re looking for a Machine Learning Platform Engineer to design and scale the backbone of our AI initiatives. You’ll build the core infrastructure and tools that enable data scientists and engineers to train, deploy, and monitor models at enterprise scale.
Key Responsibilities
-
Platform Architecture: Design and implement high-performance ML pipelines, feature stores, and model registries.
-
MLOps & Automation: Develop CI/CD workflows for ML, including automated training, testing, and deployment.
-
Scalability: Optimize distributed training and inference using Kubernetes, Spark, or Ray.
-
Observability: Create robust monitoring and alerting for model drift, latency, and resource utilization.
-
Collaboration: Partner with data scientists, DevOps, and security teams to deliver a seamless end-to-end machine learning environment.
-
Innovation: Evaluate emerging tools and frameworks (e.g., Kubeflow, MLflow, Feast) to keep the platform state-of-the-art.
Qualifications
-
4–7+ years of software engineering experience, with at least 2+ years focused on ML infrastructure or MLOps.
-
Strong coding skills in Python and one systems language (Go, Java, or C++).
-
Hands-on experience with cloud services (AWS/GCP/Azure) and container orchestration (Kubernetes, Docker).
-
Familiarity with data engineering tools (Kafka, Airflow, Spark) and model serving technologies (TensorFlow Serving, TorchServe).
-
Solid understanding of CI/CD, distributed systems, and security best practices.
-
Excellent communication skills and a passion for enabling others through platform engineering.
