Please, let Baseten know you found this job
on RemoteYeah.
This helps us grow 🌱.
Description:
As a Site Reliability Engineer, you will envision and build robust systems and processes that ensure our infrastructure is scalable, reliable, and efficient.
Your responsibilities will include automating deployments, monitoring systems, optimizing performance, and managing incidents.
You will work closely with users to learn from their past struggles in operationalizing machine learning and help onboard them onto our platform.
You will build and maintain scalable infrastructure to support the deployment and operation of machine learning models.
Establishing standards and best practices for reliability and performance across the infrastructure will be part of your role.
You will automate processes, particularly for managing CI/CD pipelines.
You will own products and projects end-to-end, functioning as both an engineer and a project manager, focusing on user empathy, project specification, and execution.
Collaborating with cross-functional teams to understand project requirements and translating them into technical solutions will be essential.
Mentoring junior team members and contributing to knowledge sharing within the organization will be expected.
You will navigate ambiguity and exercise good judgment on tradeoffs and tools needed to solve problems, avoiding unnecessary complexity.
Demonstrating pride, ownership, and accountability for your work, while expecting the same from your teammates, is crucial.
Requirements:
A Bachelor's, Master's, or Ph.D. degree in Computer Science, Engineering, Mathematics, or a related field is required.
You must have 3+ years of professional work experience in a fast-paced, high-growth environment.
Extensive experience with Kubernetes is necessary.
You should have experience in building and maintaining scalable infrastructure.
Experience with infrastructure-as-code tools (e.g., Terraform, CloudFormation, Pulumi) and CI/CD tooling (e.g., GitHub Actions, GitLab CI, Circle CI, Jenkins) is required.
Relevant open-source observability experience (Prometheus, ELK stack, Grafana stack, Opentelemetry) is a plus.
You must have the ability to own projects end-to-end, from project specification to execution.
No prior machine learning experience is required, but you should be open to learning about it.
Benefits:
You will receive a competitive compensation package that includes unlimited PTO, a 401k plan, and covered healthcare premiums.
This position offers a unique opportunity to be part of a rapidly growing startup in one of the most exciting engineering fields of our era.
You will be part of an inclusive and supportive work culture that fosters learning and growth.
Exposure to a variety of machine learning startups will provide you with unparalleled learning and networking opportunities.
Apply now
Please, let Baseten know you found this job
on RemoteYeah
.
This helps us grow 🌱.