Please, let Baseten know you found this job
on RemoteYeah.
This helps us grow 🌱.
Description:
As a Site Reliability Engineer at Baseten, you will be responsible for envisioning and building robust systems and processes to ensure the scalability, reliability, and efficiency of the infrastructure.
Tasks include automating deployments, monitoring systems, optimizing performance, and managing incidents.
You will collaborate closely with users to understand their challenges in operationalizing ML, onboard them onto the platform, and use feedback to enhance Baseten.
The role involves working on engaging problems and contributing to the growth of the ML infrastructure market.
Requirements:
Experience in building and maintaining scalable infrastructure is required for this position.
Extensive knowledge of Kubernetes is essential.
Ability to implement automation, especially for managing CI/CD pipelines, is necessary.
Proficiency in establishing standards and best practices for reliability and performance is a must.
Prior ML experience is not mandatory, but a willingness to learn about it is expected.
Bonus points for experience with relevant OSS observability tools like Prometheus, ELK stack, Grafana stack, Opentelemetry.
Benefits:
Competitive compensation package including Unlimited PTO, 401k, and covered healthcare premiums.
Opportunity to be part of a rapidly growing startup in the ML field.
Inclusive and supportive work culture that encourages learning and development.
Exposure to various ML startups for unparalleled learning and networking opportunities.
Apply now
Please, let Baseten know you found this job
on RemoteYeah
.
This helps us grow 🌱.